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HSCs Arise in Flames 

PAGE 1070 

Hematopoietic stem cells (HSCs) arise from the transdifferentiation of aortic 
endothelium during a brief period of embryonic development. Espin-Palazon 
et al. show that proinflammatory signaling in the absence of infection by prim- 
itive neutrophils to the aortic endothelium regulates HSC emergence. 



Whose Origins Fire First? 

PAGE 1015 

Rates of mutation during DMA replication are correlated with the timing of 
replication origin firing. Keren et al. discover that replication timing varies 
among humans and identify genetic variants that are c/s-acting modifiers of 
replication timing, gene expression levels, and local mutation rates, thus 
revealing an unexpected source of variation between individuals. 



Endosome DividER 

PAGE 1027 

Endocytic cargo and Rab GTPases are segregated to distinct domains of an endosome that maintain its identity until 
undergoing fission to traffic cargo. Rowland et al. now reveal that ER tubules define the position and timing of endosome 
fission. 



Substrate Cen-TRiC Insights 

PAGE 1042 

Using a combination of computational and experimental approaches, Joachimiak et al. identify the substrate-binding site as 
well as the molecular basis of substrate recognition of the hetero-oligomeric eukaryotic chaperonin TRiC. Each TRiC subunit 
is found to bear a distinct, evolutionarily conserved pattern of polar and hydrophobic residues that specifies the recognition of 
discrete substrate motifs. This combinatorial recognition of polypeptides broadens TRiC specificity and may direct its unique 
ability to fold obligate substrates. 



Antirecycling Agenda for Yeast 

PAGE 1056 

Yeast cells never re-establish polarity at the same site as was used in a previous cell division. Meitinger et al. shows that these 
sites, called cytokinesis remnants, are stably loaded with inhibitory polarity cues that prevent Cdc42-mediated growth and 
forstall premature replicative aging. 



C-RACKing Open Virai Repiication 

PAGE 1086 

Majzoub et al. show that the ribosomal protein RACK1 plays a critical role in the 
translation of IRES-dependent viral RNAs but is not essential for cell viability 
and proliferation. These findings reveal that RACK1 has selective functions in 
mRNA translation and is target for the development of antivirals targeting 
IRES-dependent viruses, such as hepatitis C. 



LncRNA Wires Up Cancer Metastasis 

PAGE 1110 

Xing et al. identify that the IncRNA BCAR4 contributes to breast cancer metas- 
tasis and elucidate how it triggers Hedgehog target genes in response to 
chemokine signals. The mechanism involves the coordinate binding of BCAR 
to two distinct transcriptional regulators and the direct and indirect effects of 
these interactions on oncogenic gene expression. 
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Enhancers Work around the Clock 

PAGE 1140 

By examining enhancer RNAs that cluster in specific circadian phases, Fang et al. reveal 
the functional component of transcription factor cistromes from mouse liver. This anal- 
ysis provides global insight into the partitioning of circadian gene expression, including 
the revelation that a single transcription factor controls opposing circadian phases. 

Gag and RNA Get into an Open Relationship 

PAGE 1096 

The HIV-1 RNA-binding protein Gag orchestrates all steps of virion genesis, including 
membrane targeting and RNA recruitment into virions. Now, Kutluay et al. report that 
Gag-RNA-binding specificity dramatically and transiently changes during virion genesis, 
facilitating viral genome packaging. 

Super-Enhancer Super-Inhibitor 

PAGE 1126 

MYC oncoproteins are thought to stimulate tumor cell growth and proliferation through amplification of gene transcription. 
Chipumuro et al. report that a covalent inhibitor of the kinase CDK7 induces regression in a mouse model of neuroblastoma 
through massive downregulation of MYC-induced transcriptional amplification by preferentially disrupting super-enhancer- 
associated genes. 

A Matrix of Neo-Drug Interactions 

PAGE 1168 

The yeast Cryptococcus neoformans is the major fungal cause of meningitis as well as driver of mortality in HIV/AIDS patients. 
Brown et al. perform a large-scale investigation of chemical-genetic interactions in C. neoformans and exploit these data to delin- 
eate numerous genes involved in the biosynthesis of capsular polysaccharide, a central virulence trait, to identify the cell cycle as 
a key target of an antifungal compound, and to develop an algorithm that successfully predicts synergistic drug interactions. 

Green Glutamine Sensor 

PAGE 1188 

Plants have long been thought to lack glutamine sensors. Chellamuthu et al. now show the plastid-localized Pll signaling pro- 
tein is a widely used glutamine sensor in the plant kingdom. Its short C-terminal extension forms a low-affinity glutamine-bind- 
ing site, which is curiously lacking in Brassicacea, the family of flowering plants that includes the extensively studied model 
Arabidopsis thaliana. 

Ribosomes Stop and Drop 

PAGE 1200 

During translation of messenger RNAs, ribosomes pause at some codons but not others. Subramaniam et al. combine exper- 
imental and computational approaches to show how nutrient status, reflected in tRNA aminoacylation, promotes or pauses 
translation elongation and show that aborted translation regulates ribosome density. 

The Human Network 

PAGE 1212 

Rolland et al. create a systematic map of ~1 4,000 high-quality human binary protein- 
protein interactions to provide proteome-wide coverage of the human interactome 
network. The map uncovers significant interconnectivity between known and candidate 
cancer gene products, providing unbiased evidence for an expanded functional human 
disease landscape. 

Dicer Is a Picky Eater 

PAGE 1153 

Dicer is known for its role in the biogenesis of eukaryotic small RNAs/microRNAs. Rybak- 
Wolf et al. now map transcriptome-wide Dicer-binding sites in human cells and C. elegans 
and identify hundreds of new miRNA, structural RNA, promoter RNA, and mitochondrial 
transcript targets. Interestingly, most Dicer-binding sites on mRNAs/IncRNAs are not 
significantly processed into small RNAs, but instead act to sequester Dicer. 
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History Lessons from Bacteria 



Microbes have been catapulted into the spotlight again by 
recent studies showing that uniquely modern phenomena, 
like jet lag and artificial sweeteners, can affect health by 
altering the microbiome (Suez et al., 2014; Thaiss et al., 
2014). However, even as the nature of our current relation- 
ship with bacteria unfolds, DMA sequencing and proteomic 
approaches are giving us a fascinating glimpse into the his- 
tory of human-bacterial interactions. By dusting off ancient 
human remains, recent work has uncovered some interesting 
findings about our ancestors’ relationship with microbes. 




The authors isolated M. tuberculosis DMA from 1 ,000 year 
old Peruvian human skeletons and compared the sequences 
to those previously identified from an 18^^ century Hungarian 
mummy (Chan et al., 201 3), to modern human TB strains, and 
to related bacteria from a number of different animals. 
Surprisingly, they found that the Peruvian M. tuberculosis 
strains were not most closely related to the other human 
strains, as might have been expected. Instead, they clus- 
tered with bacterial strains found in seals populating the 
southern hemisphere. Yes, seals. Given that hunting likely 
afforded ample opportunity for contact between people 
and seals, this suggests a rather astonishing answer to 
how TB may have spread from Europe to the Americas: by 
hitching a ride across the ocean in marine mammals. 

Not only has the sequencing of ancient DNA uncovered 
how a deadly pathogen may have colonized the New World 
long before Columbus did, but it’s also giving us insights 
into ancient microbes that stuck much closer to home. The 
oral cavity is home to a rich community of bacteria, whose 
DNA can be encased and preserved within dental plaque 
for thousands of years. Since our ancestors didn’t benefit 
from regular visits to the dental hygienist, this provides a 
source of material from which ancient oral microbiomes 
can be examined and has permitted work showing that 
with the advent of agriculture came a shift in the oral bacteria 
that cause gum disease (Adler et al., 2013). A study earlier 
this year, led by Enrico Cappellini, looks into the more recent 
past by sampling the dental plaque from German medieval 
skeletons (Warinner et al., 2014). 

The study combines DNA sequencing and proteomic ap- 
proaches to paint a picture of microbial pathogenicity, host 
immunity, and diet in the Middle Ages. Despite changes in 
diet and dental hygiene since that time, it seems that we 
share some of the same pathogenic oral bacteria with our 
medieval counterparts. In addition, the molecular machinery 



Moche jar with fine line painting and figural seal on top. Moche culture, 
Peru, Phase III, 4th-6th century AD. Linden-Museum Stuttgart; photo- 
graph: Anatol Dreyer; Inventory-Number: 119020. 

One example involves the spread of tuberculosis (TB) to the 
New World. Mycobacterium tuberculosis strains currently 
found in the Americas are related to those in Europe, sug- 
gesting European origins, but archeological evidence for 
tuberculosis in the New World prior to European contact sug- 
gests that TB was not spread through early settlers or trades- 
men. Another possible route is through human migration 
across the Bering land bridge during the Pleistocene era, 
but if this were the case, the dominance of the European line- 
age in the Americas would present yet another unexplained 
puzzle. How, then, did TB manage to get to the New World? 
A recent study led by Johannes Krause (Bos et al., 2014), 
whose previous work includes genome sequencing of 
ancient bacteria causing leprosy and the plague (Schuene- 
mann et al., 2013; Bos et al., 2011), sheds some light on 
this historical-epidemiological mystery. 




Fossilized plaque on teeth of middle-aged man from Dalheim, Germany, 
ca. AD 1100. Photo credit: Christina Warinner. 
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allowing bacteria to mount broad, low-level resistance to an- 
tibiotics was present even back then, as were host proteins 
involved in the innate immune response. Remarkably, some 
molecular remnants of food were also preserved in the 
plaque. Although leafy greens don’t fossilize well, the study 
presents paleobiological evidence that the medieval diet 
included broccoli-related vegetables, among other food- 
stuffs. Given that diet can affect microbiome composition, 
the accessibility of ancient dietary habits to genomic and 
proteomic analysis may help us better understand how 
changing nutrition has impacted the microbiome throughout 
history. 

As DNA sequencing technologies continue to advance, 
we’ll likely get an even closer look at the complex ways in 
which human and microbial histories have intertwined. In 
addition, given that proteins tend to be more stable than 
DNA, proteomic approaches may open a window even 
farther into the past than we’ve been able to access until 
now. Apart from giving us a slightly apprehensive view of 
our cetacean friends, or a sense of common humanity with 
the ancients the next time we eat some broccoli, moving 
forward, these kinds of studies promise to give us a more 
complete picture of the effects of microbes on human health, 
both past and present. 
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Credibility and Reproducibility 



Cell 



Credibility is everything for science, and it is built over time in 
both obvious and subtle ways. It is how we interact with col- 
leagues and collaborators. It is how generously and openly we 
share reagents and how we mentor students and postdocs. It 
is how we review each other’s papers, and it is how we credit 
others’ work. It is the way we educate and inform the public 
that funds us. It is the way we document and store our data. 
And it is the rigor, transparency, and attention we invest in 
designing, conducting, and reporting experiments. Without 
credibility, others can’t/won’t build on our work, and as a result, 
the pace of scientific advance is slowed. Most importantly, sci- 
ence contaminated with a lack of credibility is a house with crum- 
bling walls that engenders little trust and provides minimal value 
to our global society, present and future. 

But everyone reading this already knows the importance of 
credibility in science, so why are we discussing it here? Within 
the last 12 months the reproducibility of science, a lynchpin of 
credibility, has come under intense scrutiny, both from the NIH 
(Nature 505, 612-613) and other government funding bodies, 
as well as in the lay (The Economist, OctoberlQ, 2013, 23-28) 
and scientific press (Nature 483, 531-533— though many of 
these reports themselves would benefit from greater transpar- 
ency in reporting and still require robust demonstrations of 
reproducibility). Hearing the word “reproducibility,” most of us 
think immediately of fraud or data and image manipulation, but 
it is much broader than that. Many of the current concerns about 
reproducibility, particularly the successful scalability of preclini- 
cal data into robust drug targets for treating human disease, are 
focused on the rigor of the experimental design (inclusion of all 
appropriate controls, blinded experimental conditions, gender 
balance in experimental populations, a priori determination of 
n’s and statistical power, appropriate statistical analyses, etc.) 
and on complete transparency in reporting of these parameters 
and all collected data (for a recent Perspective on this topic, see 
Neuron 84, 572-581.) 

In June of this year Francis Collins, NIH Director, Marcia 
McNutt, Editor-in-Chief of Science, and Philip Campbell, Edi- 
tor-in-Chief of Nature, organized a meeting of journal editors 
and other contributors to collaborate on approaches to ensuring 
and improving reproducibility. Maximizing reproducibility clearly 
is an initiative involving many stakeholders, with scientists front 
and center and funding bodies, universities, journals, pharma- 
ceutical and biotech companies, patient advocacy groups, and 
society at large all taking a leading role as well. Out of the discus- 
sions at this meeting came a set of recommendations for how 
journals and journal editors can do their part. The main focus 
of the guidelines is to ensure rigorous experimental design and 
transparency in reporting the specifics about how experiments 
were performed and how data were collected and analyzed. 
Cell Press participated in the meeting and is a signatory on the 
recommendations that were recently posted (http://www.nih. 
gov/about/reporting-preclinical-research.htm). Many of the 
items in the guidelines Cell and its sister journals are already 



doing and have been doing for quite some time (providing space 
for lengthy methods sections in print and unlimited supplemental 
methods online, requiring the sharing of reagents as a condition 
of publication, providing a forum for refutation in our Matters 
Arising format, requiring authors to clearly state their statistical 
measures.) Other items in the guidelines, like developing a way 
to facilitate clear reporting in the paper of details about how ex- 
periments were designed and performed, will be valuable addi- 
tions to what we already do. Journals are encouraged to adopt 
a checklist of specific reporting criteria as a standard form for au- 
thors to complete and editors and/or reviewers to verify. While 
we at Cell and the other Cell Press journals are not yet sure 
that an author checklist per se will be the most effective imple- 
mentation for our authors, reviewers, and readers, we do wholly 
embrace the importance of the goals of the guidelines and will be 
taking steps to adapt our editorial processes and author instruc- 
tions to ensure consistent standards for appropriate experi- 
mental design and transparency in reporting. For example. 
Developmental Cell has recently introduced supplemental proto- 
cols, where authors of a paper with noteworthy, new, or particu- 
larly challenging methods are encouraged to provide a detailed 
protocol in a separate supplemental PDF. We view these steps 
as an important part of the value that we add through the editorial 
and peer review process. 

Enhanced attention to these elements will also help protect the 
authors’ credibility. With increased clarity about how experi- 
ments were performed and collected, editors, authors, and re- 
viewers will all be better able to spot and rectify concerns before 
the paper is published, hopefully reducing the number of correc- 
tions and retractions required postpublication. To this end. Cell 
and our sister journals are also introducing an image screening 
process to help ensure adherence to community standards as 
outlined in our data processing policies. More and more, we 
are finding that the concerns that arise regarding published 
data are often the result of avoidable errors. For example, 
copying and pasting the same image into two different figures 
or failing to indicate where lanes of a gel have been spliced 
together. (Oddly, the most pervasive challenges to published 
data we see at Cell relate to loading controls. There seems to 
be some misalignment among scientists regarding the impor- 
tance and meaning of the actin bands in a standard western 
blot.) When potential problems are brought to our attention by 
a concerned reader, we ask the authors to provide us with the 
original unprocessed data, together with a detailed explanation 
of how they conducted the experiment. Most of the time, we 
can see from the raw data that the problems have been intro- 
duced through simple mistakes and can be addressed with an 
erratum. But a scientific literature peppered with corrections 
does not build credibility, and worse still is when the avoidable 
errors are sufficiently extensive that they undermine the reliability 
of the entire body of work and necessitate a retraction. So, as we 
at Cell invest in checking figures and working with authors to fix 
any correctable mistakes before the paper is published, we ask 
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that authors renew their focus on preparing their manuscripts 
and reviewing the final figures with the same attentive eyes their 
readers will. By combining enhanced clarity of reporting as rec- 
ommended by the new guidelines with prepublication image 
screening, our intent is to ensure that every paper we publish 
meets not only the highest standards of interest and importance 
but also of credibility and reproducibility. 



With increased vigilance from authors, funders, and journals 
and attention to standards for experimental design and accurate 
careful reporting, we will collectively increase the public trust and 
support for research and build a stronger pipeline for converting 
our understanding of the basic processes and mechanisms of 
biology into improved diagnostics, treatments, and potentially 
cures for the myriad of global health challenges. 

Emilie Marcus 



http://dx.d 0 i. 0 rg/l 0.1 01 6/j.cell.201 4.1 1 .01 6 
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What’s Your Favorite Crystal Structure? 



Signaling Helix 




Hao Wu 

Harvard Medical School 



I stumbled into the crystal structure of Ire1 
a few years ago when my lab had just 
solved the structures of several helical 
signaling oligomers formed by death do- 
mains. I was curious if others had 
observed similar structures in their sys- 
tems and whether these higher-order olig- 
omers might tell us something new about 
molecular mechanisms in signal trans- 
duction. Ire1 is a receptor critical for the 
unfolded protein response (DPR) in the 
endoplasmic reticulum (ER). It contains 
an ER luminal region that senses aber- 
rantly folded proteins and a cytoplasmic 
region with a kinase domain and an 
RNase domain. Upon activation, the Ire1 
RNase mediates nonconventional mRNA 
splicing to enable expression of transcrip- 
tional factors that activate the UPR. 

The crystal structure of the Ire1 cyto- 
solic domain reveals a striking oligomeric 
assembly, corroborating the observed 
activation of the Ire1 RNase in solution 
by oligomerization (Nature [2009]. 457, 
687-693). Symmetric back-to-back Ire1 
dimers arrange into a filament through 
helical symmetry. Each kinase domain 
extends its activation loop to the next 
kinase domain, resembling a trans-auXo- 
phosphorylation reaction. Remarkably, 
the RNase active site that is disordered 
in a dimer structure becomes ordered in 
the oligomer to create the mRNA-binding 
pocket. The oligomerization-promoted 
allosteric activation of the RNase made 
perfect sense to me, and an analogous 
mechanism may bring about kinase and 
caspase activation in the context of innate 
immune signaling. A picture is worth a 
thousand words. 



A Single Base Pair 




Shigeyuki Yokoyama 

RIKEN and The University of Tokyo 

Reading the genetic code, which de- 
scribes the rules relating amino acids to 
codons, relies on strict molecular recogni- 
tion of both amino acids and tRNAs 
by aminoacyl-tRNA synthetases. Amino- 
acyl-tRNA synthetases must recognize 
key nucleotides far from the 3' terminal 
adenosine that is ligated to the amino 
acid. The structures of alanyl-tRNA syn- 
thetase (AlaRS) in complex with tRNA'^'® 
revealed how AlaRS selects tRNA'^'® de- 
pending just on a single base pair 
G3-U70 (Nature [2014]. 510, 507-511) 
and thereby unraveled what had been a 
mystery since 1988. A variant of tRNA'^'® 
with A3«U70 can bind to AlaRS with 
nearly the same affinity, but aminoacyla- 
tion is about hundred times slower than 
wild-type with G3-U70. The AlaRS«tRNA 
structure of the A3*U70 variant showed 
that the CCA sequence is bound in a site 
far removed from the catalytic site, 
which traps the 3'-adenosine. Moreover, 
G3*U70 is able to control the direction 
of the 3'-CCA. It is surprising that such 
high translation specificity is achieved 
by this unprecedented mechanism. The 
finding of this nonproductive binding 
mechanism may cause a paradigm shift 
in enzymology of not only aminoacyl- 
tRNA synthetases, but also other highly 
specific enzymes. Furthermore, rational 
engineering of aminoacyl-tRNA synthe- 
tases toward novel specificities for tRNAs 
may become possible in the field of syn- 
thetic biology, utilizing unnatural amino 
acids and base pairs in the central dogma. 



That Exact Moment 




Karolin Luger 

Colorado State University and HHMi 

Everybody’s all-time favorite structure 
has to be the DNA double helix because 
it exemplifies everything a high-impact 
structure should do: be aesthetically 
pleasing and explain complex biology. 
After this obvious choice, with full aware- 
ness of my bias, my favorite structure is 
the nucleosome. Obviously, the fact that 
I participated for 8 long years in the 
arduous process leading to a high-resolu- 
tion model of the repeating unit of eukary- 
otic chromatin affects my choice. But 
on an objective scale, the nucleosome’s 
beauty lies in its inherent 2-fold symmetry, 
its aesthetically pleasing proportions, and 
the way in which intricately intertwined 
histone helices generate a ramp that 
gently guides the DNA double helix into 
a superhelix. More importantly, the struc- 
ture beautifully explains its function in 
organizing the entire eukaryotic genome: 
the histone octamer holds the DNA with 
just enough force to bend it around its 
perimeter yet allows its dissociation for 
key biological processes to take place. 
Its modular design lends itself to a dy- 
namic response to regulate transcription, 
replication, and repair. Finally, the histone 
tails, the sites of numerous posttransla- 
tional modifications, are freely accessible 
for signaling and for promoting internu- 
cleosomal interactions (Nature [1997]. 
389, 251-260). The entire structure had 
to be manually fit into the electron density 
map, and so there was no single moment 
at which I first “saw” the nucleosome. But 
when I projected the structure for the first 
time at a Cold Spring Harbor meeting, it 
was greeted with a collective gasp. I 
remember that exact moment to this day. 
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The 3D Beauty of RNA 






Jennifer Doudna 

UC Berkeley and HHMI 

As a graduate student in the late 1980s, 
with a backdrop of Madonna tunes and 
post-punk fashion, I was fascinated by 
the hypothesis that RNA played a starring 
role during early evolution. Perhaps this 
agile molecule had both stored genetic 
information and catalyzed the chemical 
reactions that ensured its replication, 
enabling evolution to get going. Thinking 
that the three-dimensional structure of 
RNA was key to understanding the func- 
tion and possible origins of ribozymes, 
I worked with then-student Jamie Cate 
to determine the crystal structure of the 
P4-P6 domain of the Tetrahymena self- 
splicing ribozyme (Science [1996]. 273, 
1678-1685). I’ll never forget my sense 
of wonder as the first electron density 
maps revealed the beauty of the RNA 
structure: its coiled helices were dramati- 
cally bent to form a distinctive swan’s- 
neck shape, with magnesium ions nestled 
in the center to hold together phosphate 
backbones that would otherwise spring 
apart due to electrostatic repulsion. At 
2.8 A resolution, the 160 nucleotide P4- 
P6 structure was brimming with exciting 
details. Recurrent motifs, including the 
tetraloop-receptor interaction, the A-A 
platform, A-minor contacts, and ribose 
zippers, were subsequently observed in 
complex RNAs ranging from the ribosome 
to riboswitches, setting the stage for 
many functional studies and mechanistic 
insights. But for me, the first electrifying 
moments of seeing the P4-P6 RNA struc- 
ture emerge from initially noisy electron 
density remains a standout in my career. 



Flu Escapes Immunity 




Ian Wilson 

The Scripps Research Institute 



The crystal structure of influenza virus 
hemagglutinin (HA) in 1981 revolutionized 
our thinking on how animal viruses bind 
and enter host cells and how they evade 
our immune system (Nature [1981]. 289, 
366-373 and 374-378). At the time, the 
trimeric HA glycoprotein from the 1968 
Hong Kong pandemic virus was consid- 
ered a very large protein (200 kDa) for 
structural studies. Given its composition 
of 25% glycosylation by weight, novel 
methods were required to solve its struc- 
ture, using only one heavy atom derivative 
with 3-fold symmetry averaging and sol- 
vent flattening (>80% solvent). Its now 
classic fold was composed of a jelly roll 
globular head for receptor binding and a 
largely helical stem housing the fusion 
machinery. Mapping of natural mutations 
from 1 968 to 1 977 on the HA structure un- 
covered four major antigenic sites, eluci- 
dating how the virus can escape immune 
recognition and why we need a seasonal 
vaccine. It was a stunning moment as 
the structure came to life and explained 
a huge amount of influenza virology and 
biology. The recent discovery of broadly 
neutralizing antibodies to flu has now 
uncovered key sites of vulnerability on 
the HA of this virus and has given hope 
for structure-based design of a universal 
vaccine. Remarkably, 33 years after the 
first structure of HA, fascinating insights 
on this viral glycoprotein are still being 
revealed. 



Visualize Longevity Modulator 




Rui-Ming Xu 

Chinese Academy of Sciences 



The discovery in 2000 that Sir2 and 
related proteins (sirtuins) are NAD-depen- 
dent histone/protein deacetylases was 
a landmark achievement following more 
than a decade’s effort of genetic and 
biochemical characterization of yeast 
Sir2 function in transcriptional silencing. 
Sirtuins are found in all three kingdoms, 
but their sequences give no hints of their 
3D structures and biochemical mecha- 
nisms. The first structures of Sirtuins, an 
archaeal protein in complex with NAD 
and an apo form of human SIRT2, re- 
vealed a prominent NAD-binding Ross- 
mann-fold domain (Cell [2001]. 105, 
269-279). NAD was bound in an inverted 
orientation compared to the canonical 
binding mode. This, together with the 
presence of a zinc-binding domain, 
immediately pointed toward the locations 
of the acetyllysine-binding channel and 
the active sites for deacetylation and 
nicotinamide cleavage. Now the catalytic 
mechanisms have been worked in 
considerable detail with the help of many 
structural studies. In addition, the mecha- 
nisms regulating sirtuin activities have 
begun to emerge, with the determination 
of the Sir4-bound structure of Sir2, 
which appears to lock the regulatory and 
catalytic domains of Sir2 in a defined 
configuration to allow productive deace- 
tylation reactions. Mammalian SIRT1 
shares a similar domain structure with 
Sir2, and SIRT1 has been implicated in 
many physiological and pathological 
processes, including aging, cancer, and 
cardiovascular and metabolic diseases. 
There has been keen interest in devel- 
oping small-molecule modulators for 
SIRT1, an area where X-ray crystallo- 
graphy can make further contributions. 
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Accelerating HIV-1 Vaccine Efficacy Trials 

Dan H. Barouch^-^ * and Nelson L. Michael * 
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Despite major advances in HIV-1 therapeutics and prevention strategies, the development of a safe 
and effective prophylactic HIV-1 vaccine will likely be critical for ending the global HIV-1 epidemic. 
Yet only four HIV-1 vaccine concepts have been tested for clinical efficacy over the past 30 years. In 
this Commentary, we describe key hurdles facing the HIV-1 vaccine development field and outline 



strategies to accelerate efficacy 

Current State of the HIV-1 Vaccine 
Field 

Despite the urgent need for a globally 
effective prophylactic HIV-1 vaccine, 
only four HIV-1 vaccine concepts have 
been tested in six clinical efficacy trials 
to date (Table 1). These concepts have 
included (1) Env gp120 proteins, (2) re- 
combinant adenovirus serotype 5 (rAd5) 
vectors, (3) canarypox (ALVAC) vectors 
with gp120 boosts, and (4) DNA vaccines 
with a rAd5 boost. The first efficacy 
studies evaluated monomeric HIV-1 en- 
velope (Env) gp120 protein vaccines 
with alum adjuvant and were tested in 
two phase III vaccine trials. These vac- 
cines failed to prevent HIV-1 acquisition 
in men who have sex with men (MSM) 
and high-risk women in the United States 
and Europe (Vax004) (Flynn et al., 2005), 
as well as in injection drug users in 
Thailand (Vax003) (Pitisuttithum et al., 
2006). 

A rAd5 vector-based vaccine express- 
ing the HIV-1 internal proteins gag! poll 
nef was then tested in the Step (HVTN 
502) and Phambili (HVTN 503) phase lib 
trials. The Step trial, which was conduct- 
ed in MSM and high-risk women in the 
Americas, Caribbean, and Australia, was 
stopped for futility to block HIV-1 acquisi- 
tion (Buchbinder et al., 2008). Subsequent 
analyses suggested an increase in HIV-1 
acquisition in vaccinees, particularly in 
the subgroup of uncircumcised men who 
were seropositive at baseline for Ad5. 
This finding cast a pall over the HIV-1 
vaccine development field and led to 
increased research emphasis on the 
potential importance of vector-specific 
immune responses. The Phambili study 



evaluation of novel HIV-1 vaccine 

tested the same vaccine in high-risk 
heterosexuals in South Africa and was 
stopped during its enrollment phase 
shortly after the Step results were 
announced (Gray et al., 2011). Unblinded 
follow-up of Phambili participants sug- 
gested a very late effect of excess HIV-1 
infections in heterosexual male vaccinees 
without a clear mechanism of action (Gray 
et al., 2014). Concordant with the lack of 
efficacy observed in these clinical trials, 
preclinical studies similarly demonstrated 
that analogous rAd5 vectors expressing 
gagi poll nef from the related simian 
immunodeficiency virus (SIV) afforded 
no protection against acquisition of infec- 
tion following mucosal SIV challenges in 
rhesus monkeys (Reynolds et al., 2012). 

The third vaccine concept involved 
priming with a canarypox vector (ALVAC) 
expressing the HIV-1 antigens gagipoll 
env and boosting with the same gp120 
protein subunits that were used in the 
Vax003 study. The RV144 study was 
conducted in a low-incidence, mostly 
heterosexual population in Thailand and 
demonstrated vaccine efficacy of 31% 
at 42 months (Rerks-Ngarm et al., 2009). 
Efficacy was 60% at 12 months, indica- 
tive of an early protective effect that 
waned over time. Subsequent analyses 
demonstrated that the risk of HIV-1 infec- 
tion correlated inversely with antibodies 
directed against the first and second 
HIV-1 Env variable regions (VI V2) and 
correlated directly with Env-specific IgA 
antibodies (Haynes et al., 2012). Addi- 
tional analyses suggested that HIV-1 
infection risk also inversely correlated 
most clearly with V2-specific antibodies 
of the lgG3 isotype and non-neutralizing 



candidates. 

functional activity. Furthermore, a mole- 
cular sieve analysis showed immune 
selection pressure on specific V2 amino 
acids in vaccinees (Holland et al., 2012). 
Consistent with the clinical results, 
modest protective efficacy was also 
observed with analogous ALVAC/gp120 
vaccines against mucosal SIV challenges 
in rhesus monkeys. 

The fourth vaccine concept that was 
tested involved priming with DNA vac- 
cines expressing gag! poll nef! env and 
boosting with rAd5 vectors expressing 
gagipollenv in the HVTN 505 study, which 
was a phase lib study conducted in MSM 
in the Americas. Importantly, preclinical 
data showed that this vaccine afforded 
partial protection against low-stringency 
SIV challenges (strain SIVsmE660) in 
rhesus monkeys but failed to protect 
against high-stringency SIV challenges 
(strain SIVmac251) (Letvin et al., 2011). 
HVTN 505 was halted at its first interim 
efficacy analysis for futility to protect 
against HIV-1 acquisition or lower HIV-1 
viral RNA in breakthrough infections 
(Hammer et al., 2013). These data 
strongly suggest that preclinical studies 
of HIV-1 vaccines should be evaluated 
exclusively in stringent preclinical chal- 
lenge models. 

Future HIV-1 Vaccine Efficacy 
Studies 

Several HIV-1 vaccine candidates are 
expected to be evaluated in clinical effi- 
cacy studies in the next few years. The 
Poxvirus-Protein Public Private Partner- 
ship (“P5”) is a collaborative group that 
has been formed to build on the results 
of the RV1 44 trial and to test the identified 
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Table 1. Clinical HIV-1 Vaccine Efficacy Trials 

HIV Incidence per 



Study 


Vaccines 


Phase 


Risk Group 


1 00 Person-Years 


Location 


Result 


VaxOOS 


AIDS VAX B/E 
gp120 in alum 


III 


injecting drug users 


3.40% 


Thailand 


no vaccine efficacy 


Vax004 


AIDSVAX B/B 
gp120 in alum 


III 


high-risk women 
and MSM 


2.60% 


United States, 
Europe 


no vaccine efficacy 


HVTN 502 
Step 


MRKAd5 HIV-1 
gag/pol/nef B 


lib 


high-risk women 
and MSM 


3.00% 


United States 


halted at interim analysis 
for futility; early transient 
increased infection 
in vaccinees 


HVTN 503 
Phambili 


MRKAd5 HIV-1 
gag/pol/nef B 


lib 


high-risk heterosexual 
men and women 


3.70% 


South Africa 


no vaccine efficacy; 

late increased HIV infection 

in unblinded male vaccinees 


RV144 


ALVAC-HIV VCP1521, 
AIDSV/\X B/E 
rgp120 in alum 


III 


community risk 
heterosexual men 
and women 


0.28% 


Thailand 


31 .2% efficacy at 42 months 
as primary endpoint; 

60% efficacy at 12 months 


HVTN 505 


DNA, rAd5 (A, B, C) 


lib 


circumcised MSM 
without pre-existing 
Ad5 antibodies 


1.80% 


United States 


halted at interim 
analysis for futility 



MSM, men who have sex with men; Ad5, adenovirus serotype 5. 



V2 correlate of risk in a new series of HIV- 
1 vaccine efficacy trials in sub-Saharan 
Africa and Thailand. For studies in Africa, 
new ALVAC and gp120 vaccine products 
with HIV-1 clade C antigens are currently 
being manufactured. ALVAC vectors 
expressing clade C antigens and MF59- 
adjuvanted gp120 subunits are planned 
for a phase III licensure study in sub- 
Saharan Africa. Other strategies, includ- 
ing NYVAC and DNA-NYVAC priming 
prior to gp120 boosting, are planned 
for evaluation in non-licensure track 
phase lib efficacy studies. High-risk 
MSM cohorts are also being explored in 
Thailand for further efficacy testing. 

Additional HIV-1 vaccine candidates 
are also being developed for efficacy 
testing. A recombinant adenovirus sero- 
type 26 (rAd26) prime expressing gag/ 
pol/env with a modified vaccinia Ankara 
(MVA) boost expressing the same anti- 
gens has shown substantial protection 
against both SIVmac251 and SHIV- 
SF162P3 challenges in rhesus monkeys 
(Barouch et al., 2012, 2013). Boosting 
with an Env gp140 trimer appears to 
improve this observed protective efficacy 
in monkeys (D.H.B., unpublished data). 
A prototype rAd26 vector expressing env 
was also recently shown to be safe and 
immunogenic in humans with no evidence 
for activation of total or vector-specific 
CD4+ T cells in colorectal mucosa (Baden 



et al., 2014). A multivalent Ad26/MVA 
vaccine expressing HIV-1 “mosaic” gag/ 
pol/env immunogens designed for 
optimal coverage of global virus diversity, 
together with a stable Env gp140 trimer, 
are expected to enter clinical trials later 
this year. These Ad26/MVA and Ad26/ 
gp140 vaccines are currently under con- 
sideration for advancement into efficacy 
testing. 

Other promising vaccine candidates 
that have entered early phase clinical tri- 
als include priming with DMA vaccines 
and boosting with MVA vectors. In addi- 
tion, early phase clinical trials are planned 
with cytomegalovirus (CMV) vectors, 
which showed substantial virologic con- 
trol and possible clearance of SIVmac251 
in approximately half of vaccinated 
monkeys following challenge (Hansen 
et al., 2013). 

Why So Few HIV-1 Vaccine Efficacy 
Studies? 

Why have so few HIV-1 vaccine efficacy 
trials been conducted to date for a 
problem of such global importance? 
Factors include the enormous scientific 
challenges in designing a vaccine for a 
highly variable virus that integrates in the 
host genome, rapidly establishes latency, 
and effectively evades both humoral 
and cellular immune responses. Although 
certain correlates of risk were identified 



in the RV1 44 study, it is currently not clear 
whether these are true mechanistic corre- 
lates of protection or whether they will 
prove generalizable beyond the RV144 
study. In addition, although nonhuman 
primate challenge models have proven 
informative, they do not yet represent 
validated animal models that are neces- 
sarily predictive of clinical efficacy trials. 
As a result, large, complex, iterative 
clinical efficacy studies are required to 
show the efficacy of candidate HIV-1 vac- 
cines in humans. Moreover, the primary 
endpoint of such studies has to be inci- 
dent new HIV-1 infections in large popula- 
tions of high risk individuals. Such studies 
are expensive and logistically challenging, 
and thus they pose substantial risk to the 
pharmaceutical industry that traditionally 
drives advanced clinical development of 
vaccines. 

Accelerating Clinical Efficacy Trials 

Accelerating efficacy testing of novel and 
promising HIV-1 vaccine candidates will 
be vital for the field. In the absence of 
a validated and generalizable immune 
correlate of protection, only carefully 
designed clinical efficacy trials can deter- 
mine whether a HIV-1 vaccine works in 
humans. The results of such trials, regard- 
less of their outcomes, will have major 
impacts on the HIV-1 vaccine field, and 
they will lead to immediate prioritization 
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and deprioritization of vaccine candidates 
and strategies. Studies that show partial 
protective efficacy will also refine our 
understanding of immune correlates of 
protection. The efficacy trials conducted 
to date have had surprising outcomes 
that have been discordant with the ex- 
pectations of experts in the field, and 
thus the current state of knowledge 
is inadequate to predict the results of 
any such efficacy trials with certainty. 
To accelerate efficacy testing of next- 
generation HIV-1 vaccine candidates, 
increased industry involvement, mobiliza- 
tion of resources, expansion of the current 
vaccine pipeline, and robust preclinical 
challenge studies will likely prove critical. 

Increased industry involvement would 
be highly desirable for the HIV-1 vaccine 
field, particularly for advanced clinical 
development. Industry provides unique 
expertise in terms of manufacturing, regu- 
latory affairs, and product development, 
as well as downstream licensure capacity 
to produce and to deliver a vaccine in 
the event that efficacy trials are success- 
ful. Currently, industry involvement has 
been relatively modest for the reasons 
described previously. Both the NIH and 
the Bill & Melinda Gates Foundation 
have programs that actively support aca- 
demic-industry partnerships, and these 
programs should be continued and ex- 
panded. Engaging and encouraging in- 
dustry involvement at the earliest stages 
in vaccine development may interest a 
potential industry partner in a particular 
vaccine platform and may lead to in- 
creased involvement for advanced clinical 
development should the scientific ratio- 
nale prove compelling. Early involvement 
of industry may also be critical, as com- 
panies may need the use of particular 
raw materials, cell lines, vectors, or 
manufacturing technologies to be con- 
sistent with their internal platforms or 
processes. 

The size, cost, and logistic complexity 
of HIV-1 vaccine efficacy trials are sub- 
stantial. Thus, in addition to the major 
current investments from the NIH, Gates 
Foundation, and multiple other organi- 
zations, mobilization of new resources 
would greatly accelerate HIV-1 vaccine 
efficacy studies. Three potential sources 
for new funding include industry, govern- 
ments, and additional philanthropy. Phar- 
maceutical companies may provide direct 



support for development activities, 
including clinical efficacy trials, if they 
are sufficiently interested in a particular 
vaccine product. Additional investment 
of governments worldwide would be 
highly enabling for the field. Finally, new 
philanthropic funding will also accelerate 
the development of an HIV-1 vaccine, 
as exemplified by the Ragon Institute of 
MGH, MIT, and Harvard. 

Accelerating clinical efficacy trials of 
HIV-1 vaccine candidates also requires 
multiple distinct and scientifically prom- 
ising vaccine candidates in preclinical 
and early phase clinical studies to be 
ready for efficacy testing. It is therefore 
essential to maintain and expand a 
diverse portfolio of vaccine concepts. 
For example, novel Env immunogens are 
being developed by multiple groups to 
elicit broadly neutralizing antibodies as 
well as to optimize functional non-neutral- 
izing antibodies; a series of new vectors 
have been explored that expand the 
breadth and efficacy of virus-specific 
T cell responses; and potential global 
antigens have been developed that 
begin to address the challenge of global 
virus diversity. A robust pipeline of 
new concepts and fresh perspectives 
will also require the engagement and 
encouragement of young and early career 
investigators, particularly those from the 
developing world. 

Expanding preclinical efficacy studies 
will also help to support the rationale for 
clinical efficacy trials. Although the ability 
of nonhuman primate challenge studies 
to predict the outcomes of clinical effi- 
cacy trials still remains uncertain, strin- 
gent SIV and SHIV challenges in rhesus 
monkeys represent the most robust 
model for assessing vaccine candidates 
prior to clinical efficacy trials. Such 
preclinical challenge studies should 
therefore be expanded, particularly for 
vaccine candidates under consideration 
for efficacy trials. When clinical efficacy 
data with these vaccines become avail- 
able, the clinical results can then be 
used to refine and to improve the preclin- 
ical models. 

Perspectives 

HIV-1 vaccine development will likely be 
an iterative process. Robust basic 
research must continue but needs to be 
matched with clinical efficacy testing of 



promising new vaccine candidates. Infor- 
mation learned from each rigorous effi- 
cacy trial will be pivotal and will provide 
clear directions for the field. A more 
detailed understanding of immune corre- 
lates of protection will also be obtained 
from these clinical efficacy trials and 
may, ultimately, reduce the need to 
conduct large studies for each new 
vaccine concept. However, at the present 
time, there is no way to determine 
whether a vaccine will prevent HIV-1 
infection in humans other than clinical 
efficacy studies in which incident new 
HIV-1 infections are the primary endpoint. 

There are numerous reasons for 
optimism in the HIV-1 vaccine field. 
The RV144 study showed that an HIV-1 
vaccine is possible, and several novel 
vaccine candidates have demonstrated 
unprecedented efficacy in stringent 
nonhuman primate challenge studies. 
Our basic understanding of HIV-1 -spe- 
cific humoral and cellular immunity has 
expanded considerably, and preclinical 
and clinical immune correlates of pro- 
tection have been identified in certain 
contexts. Increasing the momentum to 
accelerate the conduct of efficacy trials 
will substantially accelerate the develop- 
ment of a safe and effective HIV-1 vac- 
cine, which will presumably be required 
to control the global HIV-1 pandemic. 
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Genome repiication programs are highly orchestrated. In this issue, Keren and colleagues leverage 
whole-genome sequencing data to discover that DNA replication timing patterns differ between 
individuals and are associated with genetic variants. These so-called rtQTLs represent a new 
form of quantitative trait loci that may influence gene dosage and mutational frequency and provide 
mechanistic insights into the regulation of DNA replication. 



DNA replication, the process of producing 
two identical replicas from one DNA tem- 
plate, is the basis for biological inheri- 
tance in all living organisms. Over 60 
years ago, early techniques of radiolabel- 
ing were used to determine that most 
DNA synthesis occurred in a restricted 
period of the cell cycle, now termed the 
“S” phase for “synthesis.” The obser- 
vation that the inactive X chromosome in 
female placental mammals was late repli- 
cating and corresponded to the hyper- 
condensed “Barr body” indicated that 
the timing of DNA replication is locus- 
specific and functionally important. Repli- 
cation origin usage also varies across cell 
differentiation programs and is strongly 
correlated with three-dimensional chro- 
mosomal architecture within the nucleus 
(Ryba et al., 2010). Notably, the replica- 
tion timing program is also highly unstable 
in human cancers and associated with 
increased translocation and somatic 
mutation rates (Mechali et al., 2013). 
However, we still know relatively little 
regarding the specific mechanisms 
orchestrating the replication timing pro- 
gram across the genomes of higher eu- 
karyotes. 

It has recently been shown that pat- 
terns of replication timing can be inferred 
from comparing the ratios of read depths 
obtained from S phase cells to G1 phase 
cells, with sites of earlier replication 
exhibiting greater sequence coverage in 
S phase cells (Van der Aa et al., 2013; 
Koren et al., 2012). While global replica- 
tion timing profiles were highly correlated 
across six individual cell lines they pre- 
viously examined (Koren et al., 2012), 



several questions remained: To what 
extent, if at all, do patterns in replication 
timing differ between humans? Are there 
genetic determinants of differences in 
DNA replication timing? If so, what are 
the broader implications of variation in 
replication timing for other aspects of 
genome regulation? 

In this issue of Cell, Koren et al. take a 
large step forward from their earlier work, 
leveraging whole-genome sequencing 
data from 161 individuals sequenced by 
the 1000 Genomes Project to examine 
variation in DNA replication timing in 
humans (Koren et al., 201 4). They demon- 
strate that variation in sequencing 
coverage across chromosomes from 
proliferating lymphoblastoid cells is corre- 
lated with previous estimates of repli- 
cation timing and thus reflects active 
DNA replication (Figure 1A). Koren et al. 
then identified and, as importantly, vali- 
dated by replication, 16 c/s replication 
timing quantitative trait loci (rtQTLs) asso- 
ciated with either the loss/gain of a repli- 
cation origin (Figure IB), variable usage 
or activation timing of a replication origin 
(Figure 1C), or variable replication initia- 
tion zone lengths (Figure 1 D). Remarkably, 
these ~20 kb haplotypes are associated 
with replication timing variation up to 2 
Mb away. These rtQTLs are also corre- 
lated with the expression levels of tran- 
scripts up to hundreds of kilobases 
away, with individuals with earlier DNA 
replication having higher mRNA expres- 
sion levels nearby. Lastly, the authors 
identify and discuss a notable associa- 
tion between alleles that predispose to 
JAK2 mutations, commonly implicated in 



hematological malignancies, and pro- 
nounced replication. Early replication at 
this site occurs in the opposite direction 
of JAK2 transcription, suggesting that its 
fragility might be enhanced by collisions 
between the replication and transcription 
machinery. 

The authors also identified 1 7 putative 
trans associations but discarded them 
because they were likely artifactual asso- 
ciations due to population structure. 
Future work incorporating larger sample 
sizes and correcting for population struc- 
ture using principal component analysis 
or linear mixed models may allow us to 
better assess whether trans-aeWng poly- 
morphic genetic factors are involved in 
coordinating replication timing. 

The novel approach of Koren et al. 
(2014) of inferring replication timing and 
origins directly from whole-genome 
sequence coverage of proliferating cells 
offers researchers the opportunity to 
investigate many unanswered questions 
without the need for isolating cells in spe- 
cific stages or labeling newly synthesized 
nucleotides. To the extent that the corre- 
sponding genome sequences derive 
from proliferating cells, this approach 
may be readily extendable to shotgun 
genome sequences currently available, 
such as The Cancer Genome Atlas 
(TCGA), the Encyclopedia of DNA Ele- 
ments (ENCODE), and The Genotype- 
Tissue Expression project (GTEx), to 
examine the relevance of DNA replica- 
tion timing variation in cancer, functional 
regulatory elements throughout the 
genome, and gene regulation across 
tissues. For instance, previous work has 
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Figure 1. Inferring Variation in Replication Timing from Whole-Genome Sequencing 

(A) DNA replication is initiated at replication origins (yellow circles). Early replication origins (right) 
synthesize more DNA than late replication origins (left) that can be observed by different sequencing read 
depths. Replication timing quantitative trait loci (rtQTLs) can result in allele-specific variation in DNA 
replication, indicated as more reads derived from the “A” than the “T” haplotype from the early replication 
origin. 

(B) Approximately half of variant regions differed by the presence or absence of a peak, indicating gain or 
loss of a replication origin or origin cluster. 

(C) -^25% of variant regions involved differences in peak height, indicating variable usage or activation 
timing of a replication origin. 

(D) The remaining quarter involved a shift in the replication slope region, indicating variable replication 
initiation zone lengths. 

“AA,” “AT,” and “TT” designate arbitrary homozygous alleles and their heterozygous intermediates 
associated as rtQTLs at three different variable regions in (B-D). 



demonstrated that replication origin activ- 
ity is associated with open chromatin 
structure and correlated globally with 
gene expression across a genome (Hira- 
tani et al., 2008). Koren et al. observed 
enrichment of rtQTLs within enhancers 
active in lymphoblastoid cell lines and 
mRNA expression QTLs (eQTLs) and 
speculate that these rtQTLs could alter 
chromatin structure or DNA sequences 
bound by factors involved in replication 
initiation. As such, next steps should 
include examining the relationships be- 
tween rtQTLs and DNasel hypersensitivity 
QTLs, histone modification QTLs, and 
nucleosome positioning as potential 
mechanisms underlying these c/s-acting 
associations. Furthermore, as with other 



genetic determinants of molecular func- 
tionality, it will be interesting to explore 
the extent to which rtQTLs are global, tis- 
sue-specific, population-specific, or tem- 
poral at different stages of development. 

Koren et al. here identify a striking 
relationship between shotgun genome 
sequence coverage and replication timing 
and exploit this relationship to define a 
new type of QTL. These rtQTLs can 
potentially elucidate mechanisms con- 
necting common variants associated 
with complex phenotypes, such as 
human disease and drug response, with 
intermediate phenotypes, such as mRNA 
expression levels and chromatin accessi- 
bility states. Given the previously known 
relationship between replication timing 



and increased mutational rates (Koren 
et al., 2012; Stamatoyannopoulos et al., 
2009), rtQTLs could also affect the proba- 
bility of somatic mutations at specific loci 
involved in human disease and cancer 
susceptibility, such as in the authors’ 
JAK2 example; therefore, quantifying 
DNA replication timing in association 
with mutation rate could address what 
proportion of mutation load is because 
of differential replication origin usage or 
efficiency. More generally, future work 
examining the factors involved in regu- 
lating replication timing could allow us 
to better understand the causes and 
consequences of mutation rate variation 
throughout the genome. Lastly, in addi- 
tion to facilitating the identification of 
the causal variants underlying rtQTLs 
and other molecular QTLs, genome edit- 
ing approaches such as CRISPR/Cas9 
may allow us to examine the molecular 
consequences of directly manipulating 
replication timing in proliferating cells. 
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HIV-1 virion assembly at the plasma membrane requires the selective recruitment of the viral RNA 
genome into nascent viral particles while cellular transcripts are excluded. Kutluay et al. now 
demonstrate that this is a two-step process in which Gag binds sequentially to different sites on 
the viral genome. 



The final step in the life cycle of HIV-1 is 
the assembly and release of progeny 
virions by budding through the infected 
cell plasma membrane (Sundquist and 
Krausslich, 2012). In this issue of Cell, 
Kutluay et al. (2014) demonstrate that 
the selective recruitment of the viral RNA 
genome into virions— mediated by the 
nucleocapsid (NC) domain of the viral 
Gag polyprotein — involves the sequential 
interaction of Gag with distinct genomic 
RNA target sites, first in the infected cell 
cytoplasm and then during the formation 
of nascent viral particles at the plasma 
membrane. Moreover, they also show 
that transient binding of tRNAs to the ma- 
trix (MA) domain of Gag plays a key role in 
targeting Gag to the plasma membrane. 

Analysis of the requirements for selec- 
tive recruitment of an RNA into HIV-1 
virions has previously identified a struc- 
tured RNA element called “psi” (“v|i”), 
located in the viral RNA 5' UTR, that 
is selectively bound by NC and that 
greatly facilitates genomic RNA pack- 
aging (Clever et al., 1995; Lu et al., 
2011). However, the specificity of the 
v|i:NC interaction is modest, and regions 
outside of i|i have been proposed to also 
facilitate genomic RNA packaging (Clever 
et al., 1995; Richardson et al., 1993). 
Moreover, while the initial interaction of 
Gag monomers with the genomic RNA 
occurs in the cytoplasm, the formation of 
multimeric Gag complexes on the RNA 
genome is dependent on localization to 
the plasma membrane (Kutluay and Bien- 
iasz, 2010). 

To address which RNA:protein interac- 
tions are involved in HIV-1 virion morpho- 
genesis and whether these are modulated 
in different cellular compartments, i.e., in 
the cytoplasm, at the plasma membrane, 
and inside HIV-1 virions, Kutluay et al. 



(2014) developed a remarkably efficient 
variation on the previously published 
“photoreactive ribonucleoside-enhanced 
crosslinking and immunoprecipitation” 
(PAR-CLIP) technique (Hafner et al., 
2010). PAR-CLIP relies on the incorpora- 
tion of the photoreactive uridine analog 
4-thiouridine into newly synthesized RNA 
molecules, which confers efficient cross- 
linking upon brief UV irradiation. Immuno- 
precipitation, using antibodies specific for 
Gag or for individual Gag domains, com- 
bined with RNase treatment and deep 
sequencing, then allowed a comprehen- 
sive analysis of the RNA sequences 
bound by these proteins. 

Initial analysis gratifyingly reveals 
strong binding of Gag to the v|i sequence 
but also demonstrates strong binding to 
the highly structured Rev response 
element (RRE), which is required for HIV- 
1 genomic RNA export to the cytoplasm 
and serves as the binding site for the 
HIV-1 Rev protein (Malim et al., 1989). 
However, only ~5% of RNA binding by 
Gag involves viral transcripts and the 
enrichment of viral RNA in the Gag-bound 
versus total RNA pool is only 3- to 5-fold 
relative to cellular RNAs, consistent with 
the relatively low in vitro specificity of 
Gag for v|i (Richardson et al., 1993). In 
contrast, analysis of HIV-1 Gag binding 
to RNA in immature virions gives a very 
different picture. In particular, virion RNA 
now contributed ~50% of all Gag binding 
sites and these were scattered across 
the entire viral RNA genome. These bind- 
ing sites are not, however, random and 
in fact reveal a clear preference for 
“A”-rich RNA sequences. Together, 
these data reveal a two-step process 
for virion RNA recruitment into virions. 
In the cytoplasm, largely monomeric 
Gag binds to both cellular RNAs and. 



with modest specificity, to viral genomic 
RNAs. Both cellular and viral RNA-bound 
Gag molecules are then recruited to the 
inner surface of the plasma membrane 
(see below), where they have the potential 
to serve as nucleation sites for Gag 
multimerization. However, this process 
requires the specific interaction of Gag 
with A-rich sequences, scattered across 
the viral genome, that are absent on 
most cellular transcripts, thus explaining 
the selectivity of genomic RNA pack- 
aging. 

While the NC domain of Gag is required 
for genomic RNA recruitment into virions, 
the amino-terminal MA domain is respon- 
sible for the recruitment of Gag, and any 
NC-bound RNA species, to the plasma 
membrane (Chukkapalli et al., 2010). 
Gag recruitment to the plasma mem- 
brane requires the interaction of a highly 
basic region (HBR) in MA with acidic 
phospholipids, especially phosphatidyli- 
nositol 4, 5-biphosphate [Ptdlns(4,5)P2], 
that are only present on the plasma mem- 
brane. The HBR of MA has previously 
been reported to also bind RNA nonspe- 
cifically, and RNA had been proposed 
to compete with membrane binding by 
Gag, thus favoring Gag recruitment to 
high-affinity binding sites on the plasma 
membrane and blocking binding to 
intracytoplasmic membranes (Chukka- 
palli et al., 2013; Chukkapalli et al., 
2010). The data presented by Kutluay 
et al. (2014) extend these earlier in vitro 
observations by showing that MA indeed 
binds RNA efficiently in infected cells. 
More importantly, they identified the 
bound RNAs as almost exclusively 
tRNAs. The tRNAs bound to MA are, 
however, released during virion forma- 
tion, and it is therefore unlikely that MA 
facilitates the virion incorporation of the 
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Figure 1. RNA Binding to the HIV-1 Gag Protein 

(A and B) Schematic representation of how tRNA binding to the Gag matrix (MA) domain in faciiitates 
recruitment of Gag to the piasma membrane (A) and the two step process by which the nucieocapsid 
(NC) domain of Gag recruits HiV-1 genomic RNAs into newiy formed virions (B). See text for detaiied 
discussion. CA, capsid domain of Gag; i];, psi RNA packaging signai; RRE, Rev response eiement. 



lysine tRNA primer, especially as this 
tRNA is not selectively bound by MA. 

In conclusion, this report (Kutluay et al., 
201 4) pulls together a number of previous 
in vitro studies and substantially extends 
these earlier data using highly sensitive 
analytical techniques in HIV-1 infected 
cells. These data reveal that two RNA 
binding domains in Gag, located in MA 
and NC, both play a key role in virion 
formation (Figure 1). tRNA binding by 
MA blocks Gag recruitment to intracellular 
membranes and instead promotes 
Gag binding to the negatively charged 
plasma membrane. Simultaneously, the 



NC domain initially recruits a mixture 
of cellular and, preferentially, viral tran- 
scripts to the plasma membrane, where 
viral genomic RNAs are then incorporated 
into virions due to selective binding of 
NC to specific A-rich sequences on the 
viral RNA genome during Gag multimeri- 
zation (Figure 1). 

One question left unresolved is whether 
viral RNA binding sites for Gag other 
than v|i, for example in the RRE, actually 
promote the recruitment of cytoplasmic 
RNA into virions, as also suggested by 
earlier work (Richardson et al., 1993). 
Kutluay et al. (201 4) do show that deletion 



of the RRE, combined with substitution 
by an unrelated nuclear RNA export signal 
to allow the HIV-1 genomic RNA to reach 
the cytoplasm, fails to uncover any role 
for the RRE in viral RNA packaging when 
the i|i element is intact. Nevertheless, 
it remains possible that the RRE, or other 
regions in the HIV-1 genome, are largely 
redundant with v|i so that a role for 
these RNA sequences in viral RNA pack- 
aging would only be uncovered when v|i 
is partially or fully inactivated. 
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Understanding nitrogen metabolism in plants holds promise for future agricultural improvements. 
Chellamuthu et al. now identify a feedback regulation in plant nitrogen metabolism through 
glutamine sensing. This mechanism appears to be conserved from algae to flowering plants with 
a few surprising exceptions. 



At the turn of the 1 9*^ century, newly elec- 
ted president of the British Academy of 
Science William Crookes gave a dire 
prediction to the assembled academy. 
He stated bluntly, “...all civilized nations 
stand in deadly peril of not having enough 
to eat” (Crookes, 1898). Crookes argued 
that a lack of fertilizer, in particular fixed 
nitrogen, would fatally limit food produc- 
tion and bring about global starvation. 
Thankfully, the development of the 
Haber-Bosch process in the early de- 
cades of the 20^^ century allowed for the 
catalyzed fixation of atmospheric nitrogen 
into active species suitable for use in fer- 
tilizer at an industrial scale. Combined 
with the advances in plant breeding and 
agricultural technology from the “Green 
Revolution” of the mid-20^^ century, we 
continue today to stay ahead of Crookes’ 
dire predictions. As the human population 
continues to expand, further understand- 
ing and optimization of plant nitrogen uti- 
lization remains a pressing task. Toward 
this end, Chellamuthu et al. (2014) in 
this issue of Cell describe a significant 
advance in the understanding of the mo- 
lecular feedback control of nitrogen meta- 
bolism across the plant kingdom. 

Beyond using nitrogen for protein or 
nucleotide synthesis alone, many plants 
also produce large quantities of nitrogen 
containing specialized metabolites known 
as alkaloids, caffeine being a well-known 
example. Plants have also evolved to 
store excess nitrogen in both seeds and 
vegetative tissues in the form of special- 
ized storage proteins. These proteins 
store nitrogen both in the amide back- 
bone of the polypeptide, as well as by 
having a high proportion of amino acids 



enriched with nitrogen containing R 
groups, such as arginine, asparagine, 
and glutamine (Staswick, 1994). Unlike 
vertebrate nitrogen metabolism, which, 
particularly in the case of carnivores, often 
operates in a state of nitrogen excess 
tuned toward excretion, plants, algae, 
and bacteria must assimilate inorganic 
nitrogen from the environment and there- 
fore have to carefully balance nitrogen uti- 
lization. Perhaps unsurprisingly, studies 
of nitrogen metabolism in these organ- 
isms have revealed elegant regulatory 
mechanisms. 

In the case of bacteria, previous studies 
indicated that nitrogen metabolism was 
regulated through a key conserved pro- 
tein, known as Pn (Forchhammer, 2007). 
Pii integrates concentration information 
of key metabolites, including ATP/ADP, 
2-oxoglutarate, and glutamine, in order 
to regulate downstream proteins in nitro- 
gen metabolism. Providing a chemical 
logic for this feedback, 2-oxoglutarate, 
ATP, and glutamine all participate in the 
dominant metabolic pathway for nitrogen 
assimilation, the glutamine synthase/ 
glutamine oxoglutarate aminotransferase 
(GS/GOGAT) pathway (Forchhammer, 
2007). While Pn senses ATP and 2-oxo- 
glutarate levels through a direct binding 
interaction, glutamine sensing is trans- 
duced through a phylogenetically diver- 
gent array of secondary sensors, which 
posttranslationally modify Pn in response 
to glutamine levels. Pn then binds and reg- 
ulates downstream enzymes in nitrogen 
metabolism, most notably the committed 
step in arginine biosynthesis, N-acetylglu- 
tamate kinase (NAGK) (Heinrich et al., 
2004). NAGK is feedback inhibited by 



arginine, its ultimate downstream prod- 
uct. P|| binding activates NAGK by reliving 
this inhibitory feedback of arginine. 

The chloroplasts of plants share 
ancestry with modern cyanobacteria, and 
plant P|| has been conserved throughout 
the long evolutionary divergence between 
plants and bacteria (Chellamuthu et al., 
201 3). Genetic studies in the model flower- 
ing plant Arabldopsis thallana indicated 
that loss of P|| produced a number of 
nitrogen metabolism-related phenotypes 
(Ferrario-Mery et al., 2006). Furthermore, 
in vitro biochemical and structural ana- 
lyses demonstrated remarkable similarity 
of P|| with its bacterial counterparts, 
including the interaction with NAGK (Miz- 
uno et al., 2007). However, the question 
remained if the divergent mechanisms of 
secondary sensing of glutamine through 
posttranslational modifications of Pn were 
conserved. 

To better define the regulatory role of 
Pii in plant nitrogen metabolism, Chella- 
muthu et al. (2014) now undertake a 
biochemical study on NAGKs and their 
corresponding Pn partners from both 
A thallana and the model green algae 
Chlamydomonas reinhardtii. Like the 
Pii-NAGK interaction in bacteria, previous 
studies had shown that A. thallana Pn 
constitutively relieves arginine feedback 
inhibition of NAGK (Beez et al., 2009). 
However, the experiments of Chella- 
muthu et al. with C. reinhardtii Pn and 
NAGK produce a surprising finding. In 
contrast to A. thallana and bacterial Pn, 
C. reinhardtii Pn does not relieve arginine 
feedback inhibition of NAGK. Chellamu- 
thu et al. surmise there might be a missing 
small molecule in their assays that Pn 
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Figure 1. Pn Proteins Directly Sense Glutamine for NAGK Activation and Arginine 
Biosynthesis in Plants, Except in Brassicaceae 

Pii activates arginine biosynthesis by antagonizing the feedback inhibition of arginine upon the committed 
step of arginine biosynthesis, NAGK. in every piant taxa except Brassicaceae, the famiiy that contains the 
modei flowering plant Arabidopsis thaliana, glutamine is directly sensed by binding to a conserved 
C-terminal extension, termed the Q loop, of P||. In the case of Brassicaceae, glutamine is not required for 
activation of NAGK by P||. In both cases, bound ATP appears to be required for Pn to antagonize NAGK 
feedback inhibition. 



could be sensing. After assaying a num- 
ber of primary metabolites, they identify 
that addition of glutamine rescues the 
C. reinhardtii Pn-NAGK interaction. Gluta- 
mine, the metabolite previously known 
only to be sensed by Pn through second- 
ary proteins, appears to be sensed 
directly, revealing a dramatic mechanistic 
divergence in Pn between plants and 
bacteria. 

Curiously, plant Pn proteins have a func- 
tionally uncharacterized but conserved 
C-terminal extension, which is absent in 



bacteria. Through X-ray crystallography, 
it is clear that the C-terminal extension in 
C. reinhardtii Pn forms a so-called Q loop 
region, which constitutes a large propor- 
tion of the glutamine-binding site. Interest- 
ingly the Q loop of A thaiiana and other 
Brassicaceae contains a three-amino- 
acid deletion, rendering their Pn unable to 
directly sense glutamine. In C. reinhardtii, 
NAGK activation through Pn binding is 
glutamine-dependent, whereas deletion 
of the Q loop region in Pn completely abol- 
ishes its ability to bind to NAGK. Measured 



in vitro, the EC 50 of glutamine for NAGK 
activation by Pn is in the millimolar range, 
which, as noted by the authors, is compa- 
rable to reported intracellular concen- 
trations of glutamine. Future research is 
required to demonstrate the physiological 
relevance of this sensing mechanism 
in vivo. 

Based on phylogenetic sequence anal- 
ysis and in vitro studies of representative 
members of diverse lineages, the Pn Q 
loop and direct glutamine sensing ap- 
pears to be widely conserved in the plant 
kingdom (Figure 1). However, many inter- 
esting questions arise when considering 
the functional relevance of the unex- 
pected divergence of Pn within Brassica- 
ceae. For instance, are the observed Q 
loop deletions an adaptive change or 
a deleterious mutation that became 
fixed in the common ancestor ot Brassica- 
ceae? Could Brassicaceae contain alter- 
native glutamine-sensing mechanisms to 
compensate for the loss of glutamine- 
sensing capability in its Pn? In vivo studies 
of these phenomena would be important 
to reveal if Pn divergence represents an 
exploitable trait for genetic engineering 
of crops. 
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Although the basic facts about the ribosome were already known 40 years ago, elucidating its 
atomic structure and molecular mechanisms required sheer persistence and the innovative use 
of new technology and methods. These advances have transformed our understanding of transla- 
tion in the cell. 



Two years after Cell was born, I left theoretical physics to 
become a graduate student in biology at UCSD, and two years 
after that, I joined Peter Moore’s laboratory at Yale to begin 
working on the ribosome. So by a strange coincidence, my life 
as a ribosome scientist has largely overlapped with the life of 
Cell. It is interesting to look back at what was known about the 
ribosome at Cell’s inception, and the remarkable progress that 
has been made since. 

That progress can be visually encapsulated in Figure 1, 
which shows the structure of the ribosome 40 years ago 
(Lake, 1976) compared to a recent atomic structure of a 
translating ribosome bound to the translocon (Voorhees et al., 
2014). The latter was obtained by single-particle electron 
cryomicroscopy (cryoEM), a technique that not only did not 
exist in 1974, but which only recently has been capable of 
producing high-resolution structures of asymmetric objects 
like the ribosome. 

The state of the ribosome in 1974 was captured in a compre- 
hensive book, entitled simply “Ribosomes,” with contributions 
by many leading scientists of the day (Nomura et al., 1974). It 
marked the end of what is commonly referred to as the “golden 
age” of molecular biology, during which the salient facts about 
the ribosome had already been established. Ribosomes from 
all species consist of consist of two subunits (SOS and 60S for 
bacteria and 40S and 60S for eukaryotes) and in most species 
are two-thirds of RNA by mass. They contain over 50 proteins, 
and three large ribosomal RNA (rRNA) molecules. The small 
subunit binds mRNA and the large subunit carries out peptidyl 
transfer. The tRNAs bound in the P and A sites hold the nascent 
peptide chain and the new amino acid to be added, respectively. 
Protein factors act at each stage of the process, and many of 
these are GTPases. These basic facts have been textbook 
material ever since. 

Given the complexity of the ribosome, to go beyond that basic 
level of understanding seemed a daunting task, so only a few 
aficionados persisted in working on it. During the subsequent 
decades, three qualitative advances have changed our under- 
standing of the ribosome. The first was the emergence of the 
idea that the ribosome is primarily an RNA-based machine. 
The second was the atomic structure of the ribosome, which 
not only confirmed that idea, but paved the way for ever more 
sophisticated experiments to understand the mechanism of 
translation. Finally, we are now beginning to understand the ribo- 



some as a dynamic machine in which large conformational 
changes are essential to its function. 

The Ribosome as an RNA Machine 

Early on, it was thought the many different proteins of the ribo- 
some might be responsible for its various functions, with the 
RNA as a scaffold to hold the various proteins in place. However, 
the ribosome poses the classic “chicken or egg” question: If the 
ribosome consists of both RNA and protein, and is needed to 
make protein, how did it originate? Crick (1968) rather pre- 
sciently wrote, “It is tempting to wonder if the primitive ribosome 
could have been made entirely of RNA” (original italics). To my 
knowledge, this was the first idea that RNA could both carry 
genetic information and perform catalysis and can be thought 
of as the origin of the “RNA world hypothesis,” which postulates 
a primordial world consisting of replicating RNA molecules 
before the advent of proteins. 

The first indication that rRNA contributes to translation at all 
arose from the discovery that complementarity between the 3' 
end of 16S rRNA and a sequence on mRNA upstream of the 
start codon was important for proper initiation of translation 
(Steitz, 1969; Shine and Dalgarno, 1974). However, base pairing 
was already known to be a feature of nucleic acids and unrelated 
to the broader functions of the ribosome such as catalysis, ligand 
binding, and movement. Similarly, modification of rRNA, but not 
protein, was shown to affect tRNA binding, but in the absence 
of any evidence for catalytic properties of rRNA, it was sug- 
gested that the binding sites must consist of both RNA and 
protein (Noller and Chaires, 1972). 

The sequencing of 16S and 23S ribosomal RNAs (Brosius 
et al., 1978; Brosius et al., 1980) enabled the establishment of 
their secondary structure (Woese et al., 1980; Glotz and Brima- 
combe, 1980) and had consequences for biology far beyond 
the ribosome: they paved the way for the discovery of a third 
branch of life, the archaea (Woese et al., 1990). 

In conjunction with the ability to reconstitute the ribosome 
from purified components (Held et al., 1973; Nierhaus and 
Dohme, 1974), knowledge of the sequence of rRNA also laid 
the groundwork for the use of chemical footprinting methods 
to study its interaction with various ligands (Moazed and Noller, 
1986; Moazed and Noller, 1989a). Such studies, as well as com- 
plementary crosslinking methods (Doring et al., 1994), identified 
regions of rRNA that interact with tRNAs in the A, P, and E sites. 
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Figure 1. The Ribosome Then and Now 

(A) Model of the E. coli ribosome 40 years ago, obtained by negative stain 
electron microscopy (Lake, 1 976). The small subunit is shown in yellow and the 
large subunit in red. 

(B) High-resolution structure by single-particle cryoEM of a translating 
mammalian ribosome (small subunit in yellow and large in blue) bound to the 
translocon (red) and A- and P-site tRNAs in purple and green respectively 
(Voorhees et al., 2014). 

These studies provided initial physical constraints for building 
models of the ribosome. 

Further evidence for functional roles for rRNAs came from 
studies on antibiotic binding sites in the ribosome. The first 
antibiotic resistance mutation mapped (to streptomycin) was 
on ribosomal protein S12 (Traub and Nomura, 1968), but in 
hindsight this was misleading, since antibiotics have little affinity 
for ribosomal proteins. Subsequent chemical footprinting 
showed that antibiotics, including streptomycin, interact with 
specific and distinct sites on ribosomal RNA (Moazed and Noller, 
1987a; Moazed and Noller, 1987b). 

Together, these studies suggested that rRNA was likely to be 
of functional importance. However, the notion of the ribosome 
as fundamentally an RNA enzyme really gained plausibility only 
when RNA catalysis was discovered in the context of the group 
I intron (Zaug and Cech, 1986) and RNase P (Guerrier-Takada 
et al., 1983). These discoveries led to a resurgence of interest 
in the ribosome and ribosomal RNA in particular (Moore, 1988). 

Unlike the case with the simpler group I intron and RNase P, 
proving that catalysis in the ribosome was a property of RNA 
proved difficult. When thermophilic 508 subunits were treated 



extensively with protease and phenol extraction to remove 
proteins, they nevertheless retained peptidyl transferase activity, 
strongly suggesting a catalytic role for RNA (Noller et al., 1992). 
However, even after extensive protease treatment, several pep- 
tide fragments and even some entire proteins resisted digestion 
and extraction, and the complete removal of proteins resulted 
in a loss of activity. So, as implicitly acknowledged by the 
authors in their cautiously worded title, “Unusual resistance of 
peptidyl transferase to protein extraction procedures,” these 
experiments were not conclusive. The limitations of such exper- 
iments were foreseen by Crick, who, when he suggested that 
primitive ribosomes may have consisted entirely of RNA, also 
said, “Without a detailed knowledge of the structure of pre- 
sent-day ribosomes it is difficult to make an informed guess” 
(Crick, 1968). As discussed below, that structure took many 
decades after the discovery of the ribosome. 

Toward the Atomic Structure of the Ribosome 

Forty years ago, the gross morphology of the two ribosomal 
subunits had been determined by conventional negative-stain 
electron microscopy with the 308 subunit having a “head,” a 
“platform,” and a “body” and the large subunit having a central 
protuberance flanked by two stalks. But strikingly, even essential 
features such as the number and location of all of the tRNA bind- 
ing sites and the path of the nascent peptide were established 
only gradually. 
tRNA Binding Sites 

Aminoacyl tRNAs are the substrates of the ribosome, respon- 
sible for delivering each new amino acid to the growing polypep- 
tide chain. Early on, it was accepted that the ribosome would 
have two tRNA binding sites, a P site occupied by a tRNA holding 
the nascent peptide and an A site for the tRNA delivering the new 
amino acid corresponding to the codon on mRNA. A third site, 
called the E (exit) site, into which the deacylated tRNA moves 
prior to being ejected from the ribosome, was first proposed a 
long time ago (Wettstein and Noll, 1965), but even its existence 
remained controversial until convincingly demonstrated by 
Nierhaus and coworkers (Rheinberger et al., 1981). The fact 
that the E site spans both the small and large subunit became 
universally accepted only after more detailed structures of the 
ribosome by cryoEM began to emerge (Agrawal et al., 1996; 
8tark et al., 1997). These studies also showed that the mRNA 
snakes around a cleft in the small subunit. 

Passage of the Nascent Peptide through a Tunnei 
in the Large Subunit 

Early cartoons of the ribosome and a more recent sculpture at 
Cold 8pring Harbor Laboratory simply show the peptide chain 
being extruded from the intersubunit space. The existence of a 
tunnel in the large subunit through which the nascent peptide 
must emerge was inferred by localizing antibodies to the nascent 
peptide to a region opposite the intersubunit interface (Bernabeu 
and Lake, 1982). More direct evidence for an exit tunnel came 
from electron microscopy of two-dimensional crystalline sheets 
or sections (Milligan and Unwin, 1986; Yonath et al., 1987), but 
the tunnel was firmly established only when cryoEM reconstruc- 
tions of sufficient resolution became available (Frank et al., 
1995), and its atomic nature was elucidated by analysis of the 
crystal structure of the large subunit (Ban et al., 2000; Nissen 
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et al., 2000). The discovery of the exit tunnel solved a number 
of mysterious problems, such as how the ribosome ensured 
proper insertion of membrane proteins into a lipid bilayer by 
direct docking of the tunnel exit with the translocon, as well as 
understanding the context in which proteins emerge from the 
ribosome and begin to fold. 

Beyond Overall Morphology 

Despite the characterization of essential features, going beyond 
the overall morphology of the ribosome proved to be difficult. 
Various approaches to chipping away at the problem resulted 
in a mass of data on protein-RNA interactions (Stern et al., 
1989; Powers and Noller, 1995; Greuer et al., 1987; Guile et al., 
1988) and on the approximate spatial location of proteins by 
neutron scattering (Capel et al., 1987). The hope was that in 
conjunction with emerging structures of individual ribosomal 
proteins (Ramakrishnan and White, 1998) and improving cryoEM 
reconstructions, this information could be combined to generate 
a molecular model for the ribosome. There appeared to be no 
realistic alternative on the horizon. However, the low-resolution 
of the information and lack of an overall three-dimensional 
context made it difficult to obtain a model of sufficient accuracy 
and resolution to make deductions about its mechanism. 
Crystallography of the Ribosome 

Ever since the 1950s, crystallography has been used to deter- 
mine the structure of macromolecules. Beginning in the 1960s 
(Byers, 1966), the ribosome itself had been shown to form two- 
dimensional crystalline sheets, and such sheets were used to 
obtain structures by electron microscopy (Unwin, 1977). These 
early studies showed that the ribosome had a defined struc- 
ture that might be capable of producing three-dimensional 
crystals. Moreover, such crystals had been obtained for large 
viruses, the nucleosome core particle and FI ATPase. Against 
this backdrop, it seemed reasonable to attempt the crystalli- 
zation of the ribosome. Although the structure of some viruses 
had been determined to high resolution, they represented a 
special case of high symmetry. In contrast, even if suitable 
crystals could be obtained, solving the structure of such a 
large asymmetric structure as the ribosome seemed a pipe 
dream because the technology and methods to solve it did 
not exist. 

Nevertheless, crystals of the 50S subunit were obtained from a 
thermophile. Bacillus stearothermophilus (Yonath et al., 1980), 
and were followed a few years later by crystals of the SOS subunit 
and the entire 70S ribosome from Thermus thermophilus 
(Trakhanov et al., 1987). These early crystals did not diffract 
well. Almost a decade later, the discovery of crystals of the 
50S subunit from the archaeon Haloarcula marismortuii that 
could diffract to 3 A resolution (von Bohlen et al., 1991) meant 
that at least in principle it was possible to determine an atomic 
resolution structure. Cryocrystallography, in which crystals are 
cooled to 100 K to minimize radiation damage, was another 
essential step that allowed data collection (Hope, 1988). Never- 
theless, for many years, the prospect of solving the ribosome’s 
structure or even producing maps with recognizable features 
seemed out of reach. 

Ultimately, it took technical advances in synchrotron X-ray 
sources and detectors, computing, as well as the participation 
of several research groups who brought in new ideas for 



structure determination, to eventually obtain complete atomic 
structures of the 50S and SOS subunits (Ban et al., 2000; Wim- 
berly et al., 2000). These structures were used to model a 
5.5 A structure of the entire ribosome with mRNA and tRNAs 
the following year (Yusupov et al., 2001). A few years later, 
high-resolution structures were obtained of both the empty 
ribosome (Schuwirth et al., 2005) and that of a complex with 
mRNA and tRNAs (Selmer et al., 2006). Currently there are crys- 
tal structures of the bacterial ribosome in many functional states 
(reviewed in Voorhees and Ramakrishnan, 2013). Moreover, 
there are now high-resolution structures of the entire eukaryotic 
ribosome (Ben-Shem et al., 2011) and the isolated 40S and 60S 
subunits (RabI et al., 201 1 ; Klinge et al., 2011). 

These structures have transformed our understanding of 
the ribosome. The peptidyl transferase and decoding centers 
consist almost entirely of rRNA, showing that the ribosome is 
fundamentally an RNA-based enzyme. The structures have 
enabled sophisticated biochemical and genetic experiments to 
elucidate both basic mechanisms and the regulation of ribo- 
some function. Moreover, the structures have enabled the direct 
visualization of antibiotics bound to the ribosome, enabling a 
better understanding of their mode of action and the potential 
for design of new, improved antibiotics (reviewed in Wilson, 
2014). 

The Ribosome in the 21^* Century 
How Does the Ribosome Work? 

Whereas biochemical experiments designed to obtain structural 
information, like footprinting and crosslinking, have been almost 
completely superseded by direct methods such as crystallog- 
raphy or cryoEM, those designed to probe mechanisms are 
thriving. In particular, the development of pre-steady-state 
kinetics that use a combination of fluorescent reporters and 
quench flow methods has greatly helped dissect the various 
steps along the translation pathway (e.g., Rodnina and Winter- 
meyer, 2001). These methods have been further extended by 
the development of single-molecule fluorescence techniques, 
which can probe the rate and sequence of specific conforma- 
tional changes in the ribosome during translation (Blanchard 
et al., 2004). Complementing such studies are the direct mea- 
surement of force (and thus work) required during translation 
(Liu et al.,2014). 

Complementing the biochemistry, molecular dynamics, which 
has the potential to calculate both reaction mechanisms (Aqvist 
et al., 2012) and rates as well as trajectories in large-scale 
changes (Sanbonmatsu et al., 2005), is likely to play an 
increasing role in complementing experimental methods to 
probe ribosome function. These diverse methods are providing 
a detailed understanding of the mechanisms of translation. 

In addition to understanding the internal mechanism of ribo- 
somes, we are now getting insight into broader questions of 
translation in the cell. A particularly exciting advance is the 
method of ribosome profiling, which provides a genome-wide 
in vivo snapshot at nucleotide resolution of ribosomes along 
mRNA (Ingolia et al., 2009). This approach has already led to 
many major findings about gene expression in the cell, including 
translational pausing and the effect of various regulators of 
translation. 
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The Ribosome Is a Dynamic Rather Than Static Structure 

That the ribosome cannot be static was understood right from 
the outset because during translocation after peptidyl transfer, 
the tRNAs and mRNA must move by precisely one codon with 
respect to the ribosome in order to allow a new cycle of elonga- 
tion. Bretscher had proposed long ago that tRNAs move in two 
steps (Bretscher, 1968), first with respect to one subunit to 
form “hybrid” states and then with respect to the other, possibly 
coupled by a movement of the two subunits relative to each 
other. This hypothesis also provided a rationale for the existence 
of two subunits of ribosomes in all species. However, it lan- 
guished for almost two decades until it was shown decisively 
that tRNAs move first with respect to the large subunit and 
then to the small (Moazed and Noller, 1989b). 

Translocation could involve not just a movement of tRNAs but 
also a coupled rotation of the two subunits. Such a rotation was 
indeed observed by cryoEM (Frank and Agrawal, 2000). More- 
over, even steps that were previously thought to be localized, 
such as decoding, which involves selection of a cognate tRNA 
corresponding to a codon, were shown to involve an induced 
conformational change (Pape et al., 1999). It is now known that 
major conformational changes are associated with virtually every 
step of translation including decoding and peptidyl transfer. 
Moreover, beginning with a study on tRNA associated with 
EF-Tu (Valle et al., 2002), studies have shown that the tRNAs 
themselves not only move, but distort considerably as they go 
through the ribosome. 

Recent Advances in CryoEM Are Revolutionizing 
Structural Studies on the Ribosome 

The development of cryoEM for asymmetric structures made 
a major impact right from the first structures of a ribosomal func- 
tional complex (Frank et al., 1995). Initially, the resolutions were 
not sufficient to derive molecular details without prior informa- 
tion, but they were nevertheless valuable to obtain structures 
of many states of the ribosome that could not be crystallized 
and also to analyze conformational changes. Recently however, 
as a result of direct electron detectors and improved software, it 
is possible to obtain near-atomic structures of the ribosome from 
only ~30,000 particles (Bai et al., 2013). Moreover, a complete 
atomic structure of the yeast mitochondrial large subunit has 
been determined de novo by cryoEM even when there were no 
known structural homologs for about half of the constituent 
proteins (Amunts et al., 2014). CryoEM is revolutionizing the 
structural biology of large complexes. Apart from eliminating 
the need for crystals, the method requires orders of magnitude 
less material, and most importantly can computationally sort 
both biochemical and conformational heterogeneity in a sample. 
It is thus ideal for studying complexes that are unstable or 
dynamic. For translation, this method will help solve more 
complex problems such as the structures of eukaryotic initiation 
complexes, of ribosomal complexes with membrane-bound 
receptors as well as with factors involved in quality control. It is 
safe to predict that for large complexes, cryoEM will largely 
supersede crystallography. 

Conclusions 

Those of us who began studying translation several decades ago 
could not have dreamed of the state of our understanding today 



or the dizzying array of technologies that allow us to tackle 
harder problems in greater detail. We are now in position not 
only to understand the mechanism of the ribosome itself, but 
also how it is made, assembled, and regulated. We can visualize 
its interaction with quality control factors in the cell, see how 
viruses hijack it, and understand whether ribosomes are special- 
ized for various functions. It is possible that 40 years hence our 
understanding of translation and its regulation will be as amazing 
to us now as today’s knowledge would have been 40 years ago. 
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To celebrate a century of X-ray crystallography, I describe how 100 crystal structures influenced 
chromatin and transcription research. 



Introduction 

When Max von Laue first illuminated a crystal with X-rays 100 
years ago, it was unclear what the obtained diffraction pattern 
meant. William Henry Bragg and his son Lawrence, however, 
soon realized that X-ray diffraction provided information about 
the inner structure of crystals. After decades of elaboration. 
X-ray crystallography advanced to become the most widely 
used method for the determination of 3D structures. Its ap- 
plication to biological macromolecules fostered the deve- 
lopment of molecular biology in the second half of the 20^^ 
century. Crystallography defined biological paradigms such 
as molecular recognition, enzymatic catalysis, and allosteric 
regulation. 

Until about 30 years ago, researchers could still follow publica- 
tion of all new crystal structures of biomolecules. But then the 
pace at which new structures were solved increased rapidly 
due to the advent of new enabling techniques. Proteins were ob- 
tained in recombinant form, and nucleic acids were synthesized 
in large quantities. Crystals were cryo-cooled to slow down ra- 
diation damage. Synchrotron X-ray sources improved, and fast 
X-ray detection devices emerged. Modern computers and better 
software for structure determination became available. By now, 
1 00,000 structures have been deposited in the protein database. 
Many of these revealed the inner workings of molecular ma- 
chines, allowing researchers to rationalize phenotypes of muta- 
tions and to engineer biological processes. 

Crystal structures can be like landmarks. They can guide us on 
our way toward a better understanding of a biological process 
(Shi, 2014 [this issue of Cell]). Landmark structures not only 
disclose some of life’s secrets, but they also open up new fron- 
tiers. Here, I describe many of the structures that I consider to be 
landmark structures in the biology of chromatin, transcription, 
and epigenetics. I hope the resulting list of about 100 crystallo- 
graphic structures, along with several structures obtained by 
other methods, exemplifies how structural information influ- 
enced the community and led to new concepts. 

How DMA Is Structured 

The proposal of the double-helical structure of DMA relied on 
X-ray diffraction patterns of DMA fibers obtained in the middle 
of the last century (Watson and Crick, 1953) (Figure 1). The direct 
observation of a nucleic acid duplex, however, had to await the 
crystal structure of a transfer RNA (tRNA) from yeast in 1974 
(Robertus et al., 1974) (Figure 1). The structure of a B-DNA 



duplex was solved only after DMA synthesis methods became 
available (Wing et al., 1980) (Figure 1). Crystal structures of 
DMA in A-form (Shakked et al., 1981) and Z-form (Wang et al., 
1979) highlighted the sequence-dependent conformational flex- 
ibility of DMA. 

In eukaryotic nuclei, DMA is packaged with histone proteins 
into chromatin. The fundamental unit of chromatin, the nucleo- 
some core particle, was elucidated structurally in 1984 at a res- 
olution of 7 A (Richmond et al., 1984). When the resolution 
reached 2.8 A, a detailed view of the nucleosome emerged 
that revealed the DMA conformation and DMA interactions with 
histones (Luger et al., 1997). The nucleosome core structure 
confirmed the structure of the free histone octamer (Arents 
et al., 1991). It further showed that the histone protein tails 
emerged between and around DMA duplexes to become avail- 
able for interactions with other nucleosomes or chromatin. 

The structure of higher-order chromatin is dynamic, but a 
complex of four nucleosomes could be crystallized and showed 
two stacks of nucleosomes and DMA that zigzagged between 
them (Schalch et al., 2005). Electron microscopy revealed how 
such tetranucleosome units may be arranged within a 30 nm 
fiber of chromatin (Song et al., 2014) and provided evidence 
for helical order in such fibers (Scheffer et al., 2011). Another 
electron microscopic study provided an alternative fiber model 
(Robinson et al., 2006). These results explained how extended 
DMA molecules can be packaged in the cell nucleus but also pro- 
vided models for how chromatin regulates the accessibility of 
genes and their transcription. Whereas a compact chromatin 
structure can cause gene repression, gene activation requires 
chromatin opening and assembly of the transcription machinery 
at the promoter. 

How DMA Is Recognized 

To enable transcription, cells use transcription factors that bind 
to specific DMA sites. The first crystal structures of transcription 
factors included the bacterial catabolite activator protein CAP 
(McKay and Steitz, 1981) (Figure 1) and the bacteriophage 
lambda proteins cro (Anderson et al., 1981) and repressor 
(Pabo and Lewis, 1982). These structures contained helix-turn- 
helix regions that were involved in DMA binding and led to the 
concept of DNA-binding protein motifs. The studies of the bacte- 
riophage proteins required protein overexpression because 
these transcription factors could not be isolated from natural 
sources in quantities required for structural studies. 



CrossMark 



Cell 159, November 20, 2014 ©2014 Elsevier Inc. 985 




Cell 




1950 



DNA double 
helix model 



1960 



1970 1980 1990 

I L I 



2000 



2010 



TBP 



First transciption 
factors (CAP) 



First protein-DNA Nucleosome Bacterial First chromatin- 

complexes RNAP modifying enzyme 

(434 repressor) (Gcn5) 



First chromatin- 
remodelling factor 
First RNAP- (SNF2-type) 
protein complex 
(Pol ll-TFIIS) 



tRNA 



DNA duplex 



Figure 1. A Selection of Landmark Crystal Structures in the Biology of Chromatin and Transcription 

From left to right, the depicted structures are yeast tRNA, a DNA duplex, the bacterial transcription factor CAP, the bacteriophage 434 repressor protein in 
complex with DNA, the eukaryotic TATA-binding protein TBP, the nucleosome, the bacterial RNA polymerase, the histone acetyltransferase Gcn5, the eukaryotic 
RNA polymerase 10-subunit core enzyme, the complete 12-subunit RNA polymerase II complex in complex with transcription factor TFI IS, and an archaeal Swi/ 
Snf-type ATPase resembling the catalytic subunit found in many chromatin remodeling complexes. DNA is shown in blue, and proteins are depicted as ribbon 
models in different colors. For details, please refer to the text. 



Structures of DNA-bound transcription factors led to the 
concept of sequence-specific DNA recognition. DNA complexes 
of the repressor proteins from bacteriophages 434 (Anderson 
et al., 1987) (Figure 1) and lambda (Jordan and Pabo, 1988) 
and of the 434 cro protein (Wolberger et al., 1988) revealed 
protein helices bound in the DNA major groove. DNA-binding 
helices were also observed in structures of homeodomains 
(Kissinger et al., 1990; Otting et al., 1990; Qian et al., 1989). In 
a “leucine zipper” of the GCN4 factor, protein helices in the 
DNA major groove were extended and used for factor dimeriza- 
tion (Ellenberger et al., 1992; Konig and Richmond, 1993). The 
transcription factors recognized target sequences via interac- 
tions of amino acid residues with DNA base edges. Such “direct 
readout” can be complemented by “indirect readout” of the 
DNA conformation via protein-DNA backbone contacts (Lesser 
et al., 1990; Otwinowski et al., 1988). 

Later protein-DNA complex structures revealed a variety of 
DNA-binding structural motifs. The transcription factor NF-kB 
uses a p barrel fold to contact DNA via protein loops (Becker 
et al., 1 998; Ghosh et al., 1 995; Muller et al., 1 995). Transcription 
factors of the zinc finger family recognize DNA with small protein 
folds that are stabilized by zinc ions (Fairall et al., 1993; Luisi 
et al., 1991; Marmorstein et al., 1992; Pavletich and Pabo, 
1991). Zinc fingers were later used for the design of proteins 
with new DNA-binding specificities (Choo et al., 1 994). This cata- 
lyzed the development of protein and genome engineering as 
new research fields. Zinc fingers were also present in Klf4 
(Schuetz et al., 2011), which, together with transcription factors 
Oct4, Sox2, and c-Myc, enables reprogramming of the genome 
and generation of induced pluripotent stem cells. 

Crystallography also showed how transcription factors bind to 
adjacent DNA sites for combinatorial gene regulation. The DNA- 
bound structures of yeast MATa2 interacting with MATal (Li 
et al., 1995) and with MCM1 (Tan and Richmond, 1998) revealed 
factor-factor interactions that underlie synergistic DNA binding. 
This concept held for human transcription factors (Piper et al., 
1999). Oct4 and Sox2 can also bind to neighboring DNA sites 



(Remenyi et al., 2003). Crystallography also led to a model 
of an “enhanceosome” containing eight transcription factors 
bound to DNA (Panne et al., 2007). Here, binding of one factor in- 
duces a DNA conformation that promotes binding of a neigh- 
boring factor. 

Transcription factors can also bend DNA dramatically. The 
bacterial CAP protein bends DNA by 90 degrees to enable spe- 
cific DNA recognition (Schultz et al., 1991). The eukaryotic TATA 
box-binding protein (TBP) also introduces a 90 degree bend into 
DNA (Kim et al., 1993a, 1993b). The integration host factor (Rice 
et al., 1996) and the mitochondrial transcription factor A (Ngo 
et al., 2011; Rubio-Cosials et al., 2011) can even bend DNA by 
180 degrees, inducing a U-turn. To achieve DNA bending, pro- 
teins can use two strategies. They can insert amino acid residues 
like wedges between DNA base pairs to disrupt base stacking 
and may also neutralize backbone charges on one side of 
DNA, which induces bending due to the repulsion of phosphates 
on the opposite strand. 

How DNA Binding Is Regulated 

Crystallography further established concepts that explained 
bacterial gene regulation. In the trp operon, the Trp repressor 
protein inhibits expression of enzymes required for tryptophan 
biosynthesis when enough of the amino acid is available. The 
structure of the Trp repressor revealed a homodimer with a 
DNA-binding helix in each monomer (Schevitz et al., 1 985). Bind- 
ing of the regulator tryptophan alters the relative position of 
the two helices to enable DNA binding and gene repression 
(Otwinowski et al., 1988). These studies also showed that water 
molecules in the protein-DNA interface may contribute to 
sequence-specific DNA recognition. 

Many transcription factors contain not only a DNA-binding 
domain but also additional domains that can activate transcrip- 
tion or bind other transcription factors or small-molecule regula- 
tors. In the bacterial lac operon, the Lac repressor protein binds 
DNA to control the expression of enzymes involved in lactose 
metabolism. The lac repressor contains a domain that binds 
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the regulator lactose (Friedman et al., 1995) and a domain that 
binds DNA (Bell and Lewis, 2000; Lewis et al., 1996). The regu- 
lator allosterically changes the orientation of the DNA-binding 
domains, leading to DNA dissociation and gene activation. 

Eukaryotic transcription factors are generally also modular. 
The nuclear receptor for the hormone estrogen comprises a 
DNA-binding domain (Schwabe et al., 1993) and a hormone- 
binding domain (Brzozowski et al., 1997). Binding of estrogen 
influences receptor dimerization and its interaction with other 
factors that regulate transcription. The tumor suppressor p53 
contains a tetramerization domain that clamps four p53 subunits 
together (Clore et al., 1 994) and a DNA-binding domain that har- 
bors tumorigenic mutations that can impair DNA binding (Cho 
et al., 1994). 

Transcription factors are often regulated by nuclear localiza- 
tion. For example, the inhibitory protein \-kB retains the tran- 
scription factor NF-kB in the cytoplasm by masking its nuclear 
localization sequence (Fluxford et al., 1998). External signals 
remove I-kB, leading to nuclear import of NF-kB, DNA binding, 
and gene activation. Similarly, external signals trigger phosphor- 
ylation of cytosolic ST AT transcription factors (Becker et al., 
1998), which leads to factor dimerization, nuclear import, DNA 
binding, and gene activation. 

How DNA Directs RNA Synthesis: Single-Subunit 
RNA Polymerases 

It took until the late 1990s to obtain a crystallographic view of 
transcription. The shape of the single-subunit DNA-dependent 
RNA polymerase from bacteriophage T7 was observed in a me- 
dium-resolution crystallographic study (Sousa et al., 1993). The 
crystal structure of T7 RNA polymerase revealed an active center 
cleft and a domain that binds promoter DNA and enables 
sequence-specific initiation of RNA synthesis (Jeruzalmi and 
Steitz, 1998; Cheetham et al., 1999). In the structure of a T7 
RNA polymerase transcribing complex, the DNA template strand 
formed a hybrid duplex with a transcript of three nucleotides at 
the active site (Cheetham and Steitz, 1999). This suggested that 
the polymerase could only hold a three base pair DNA-RNA 
hybrid, but later structures of theT7 elongation complex revealed 
extensive refolding of the polymerase, which then accommo- 
dated a seven to eight base pair hybrid (Tahirov et al., 2002; Yin 
and Steitz, 2002). 

These studies established many concepts in DNA-directed 
RNA synthesis. They highlighted the importance of promoter 
recognition and showed how ribonucleotide substrates are 
selected over deoxyribonucleotides to prevent synthesis of 
DNA. Addition of nucleotides to the RNA occurs in two steps 
by first binding the substrate nucleoside triphosphate to a prein- 
sertion site and then moving it to the insertion site for catalysis 
(Temiakov et al., 2004). Details of RNA synthesis were revealed 
by X-ray studies of the single-subunit RNA polymerase from 
bacteriophage N4 (Basu and Murakami, 2013). 

Eukaryotic cells also contain a single-subunit RNA poly- 
merase, the polymerase transcribing the mitochondrial genome. 
Mitochondrial RNA polymerase structurally resembles T7 RNA 
polymerase but contains an additional region for promoter bind- 
ing (Ringel et al., 2011). In contrast to T7 RNA polymerase, mito- 
chondrial RNA polymerase, however, does not refold during the 



transition from transcription initiation to elongation (Schwing- 
hammer et al., 2013). 

How DNA Is Transcribed in Cells: Multisubunit RNA 
Polymerases 

The first structures of cellular RNA polymerases were obtained 
at the turn of the millennium. The structure of bacterial RNA 
polymerase from Thermus aquaticus (Zhang et al., 1999) was 
followed by the structure of yeast RNA polymerase II (Pol II), 
which synthesizes messenger RNA (Cramer et al., 2000, 2001) 
(Figure 1). The structures revealed functional elements and 
enabled structure-function analysis of cellular transcription. 
They also suggested that catalysis followed a two-metal ion 
mechanism (Cramer et al., 2000), as proposed for all nucleotide 
polymerases (Steitz and Steitz, 1993). Comparison of bacterial 
and eukaryotic RNA polymerases revealed a conserved multisu- 
bunit architecture and an active center cleft with a flexible bridge 
helix for the translocation of the polymerase relative to DNA. 

The structure of a Pol II transcription elongation complex 
showed that the polymerase clamp domain closed over a 
DNA-RNA hybrid of eight to nine base pairs during transcription 
elongation and suggested the basis for nucleic acid strand sep- 
aration during transcription (Gnatt et al., 2001). Later structures 
of bacterial RNA polymerase and Pol II transcription elongation 
complexes with bound nucleoside triphosphate substrate (Vas- 
sylyev et al., 2007; Wang et al., 2006) revealed that folding of 
the polymerase trigger loop closed the active site and sug- 
gested mechanisms of substrate selection. The same mecha- 
nisms likely occur in archaeal RNA polymerases, which show 
a remarkable similarity to Pol II (Hirata et al., 2008; Korkhin 
et al., 2009). 

The first structure of Pol II with a bound transcription factor, the 
elongation factor TFIIS, showed that a single “tunable” active 
site was used both for RNA synthesis and RNA cleavage and 
indicated the mechanism of proofreading during transcription 
(Kettenberger et al., 2003) (Figure 1). Electron microscopy of 
an analogous bacterial complex revealed a similar topology 
(Opalka et al., 2003). The structure of an “arrested” Pol II elonga- 
tion complex with a backtracked RNA provided insights into how 
TFIIS rescues polymerase that stalled during transcription 
(Cheung and Cramer, 2011). Crystal structures of a second eu- 
karyotic RNA polymerase, Pol I, showed that a subunit corre- 
sponding to TFIIS was located at the active center of this enzyme 
(Engel et al., 2013; Fernandez-Tornero et al., 2013). 

Transcription is coordinated with cotranscriptional events 
such as RNA processing. This coordination is to a large extent 
achieved by binding of factors to the flexible C-terminal domain 
(CTD) of Pol II. Changes in CTD phosphorylation lead to an ex- 
change of protein factors during transcription. The first struc- 
tures of CTD peptides bound to CTD-binding proteins showed 
that the CTD adopts multiple conformations and revealed the ba- 
sis for phosphorylation-specific binding (Fabrega et al., 2003; 
Meinhart and Cramer, 2004; Verdecia et al., 2000). Crystallog- 
raphy also revealed the mechanisms and the determinants for 
substrate specificity of CTD kinases (Baumli et al., 2012; Baumli 
et al., 2008; Bosken et al., 201 4; Lolli et al., 2004; Schneider et al., 
201 1 ; Tahirov et al., 201 0) and CTD phosphatases (Ghosh et al., 
2008; Kamenski et al., 2004; Xiang et al., 2010). 
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How Transcription Starts 

For the initiation of transcription, RNA polymerases cooperate 
with initiation factors to locate and open promoter DNA. The 
structure of the Pol II initiation factor TBP revealed a saddle- 
shaped molecule (Nikolov et al., 1992) (Figure 1) that bound 
the DNA minor groove and bent DNA by 90 degrees (Kim 
et al., 1993a, 1993b). The resulting TBP-DNA complex can re- 
cruit initiation factors TFIIA (Geiger et al., 1996; Tan et al., 1996) 
and TFIIB (Nikolov et al., 1995) on either side. Electron micro- 
scopy revealed the overall architectures of the large multipro- 
tein initiation factors TFIID (Andel et al., 1999; Bieniossek 
et al., 2013; Cianfrocco et al., 2013; Leurent et al., 2002) and 
TFIIH (Chang and Kornberg, 2000; Gibbons et al., 2012; Schultz 
et al., 2000). Crystal structures for individual parts of TFIID, 
TFIIE, TFIIF, and TFIIH were reported, including an archaeal ho- 
molog of the ATPase in TFIIH (Fan et al., 2006) that functions in 
DNA opening. 

An understanding of transcription initiation, however, had to 
await structures of initiation factor complexes with RNA poly- 
merases. An initial structure of a partial Pol ll-TFIIB complex 
(Bushnell et al., 2004) was consistent with crosslinking results 
(Chen and Hahn, 2003) and revealed a domain of TFIIB that 
bound Pol II to recruit it to the promoter. The structure of 
the Pol ll-TFIIB complex (Kostrewa et al., 2009; Liu et al., 
2010) enabled modeling of initiation complexes with closed 
double-stranded and open DNA. A subsequent structure of 
a Pol ll-TFIIB complex with bound DNA and an initial RNA 
transcript showed that TFIIB alters the polymerase active 
site to allosterically activate RNA synthesis (Sainsbury et al., 

2013) . 

Bacterial transcription initiation relies on sigma factors. The 
structure of a free sigma factor revealed its modular nature 
(Malhotra et al., 1996). Structures of RNA polymerase with 
bound sigma factor (Murakami et al., 2002b; Vassylyev et al., 
2002) and with sigma factor and promoter DNA (Murakami 
et al., 2002a) showed that sigma factor bridges between poly- 
merase and the promoter and suggested how sigma factor do- 
mains recognize DNA sequence elements. When promoter 
DNA is unwound, one sigma factor domain traps the nontem- 
plate single strand of DNA (Feklistov and Darst, 2011). An alter- 
native sigma factor is able to stabilize a flipped-out base from 
the nontemplate strand during DNA melting (Campagne et al., 

2014) . Structures of bacterial transcription initiation complexes 
showed how RNA polymerase and sigma factor cooperate to 
recognize promoter sequences, unwind DNA, and “preorgan- 
ize” the template strand for RNA chain initiation (Zhang et al., 
2012 ). 

The initiation factors sigma and TFIIB perform similar func- 
tions, including DNA binding and opening and defining the start 
site of transcription. Comparison of bacterial and eukaryotic 
structures showed that sigma factor and TFIIB interact with 
roughly the same parts of their RNA polymerases but have unre- 
lated folds, arguing for convergent evolution. Recently, electron 
microscopy and crosslinking provided the location of additional 
initiation factors in human and yeast Pol II initiation complexes 
(Grunberg et al., 2012; He et al., 2013; Muhibacher et al., 2014; 
Murakami et al., 2013), enabling further crystallographic studies 
of transcription initiation. 



How Transcription Initiation Is Regulated 

Bacterial transcription can be regulated by direct interactions of 
transcription factors with the general transcription machinery 
and DNA. The CAP factor activates transcription by binding 
DNA and an adjacent domain of the polymerase, thereby recruit- 
ing the polymerase to the promoter (Benoff et al., 2002). Simi- 
larly, a protein of bacteriophage lambda activates transcription 
by binding to DNA and an adjacent domain of sigma (Jain 
et al., 2004). Eukaryotic transcription factors normally bind to co- 
activator complexes, which then bind the general transcription 
machinery. Coactivator binding requires activation domains, 
which are often unstructured in their free state but can adopt 
short helical structures upon coactivator binding (Brzovic et al., 
201 1 ; Kussie et al., 1 996; Radhakrishnan et al., 1 997). Vice versa, 
a coactivator can also form a a helix to bind a transcription factor 
(Shiau et al., 1998). 

Many transcription factors bind to the coactivator complex 
Mediator, which consists of 25-35 subunits arranged in four 
modules. The crystal structure of the Mediator head module re- 
vealed an intricate fold with surfaces for interactions with Pol II 
and other Mediator modules (Imasaki et al., 2011; Lariviere 
et al., 2012; Robinson et al., 2012). Recent electron microscopy 
revealed the central location of the head module within the over- 
all Mediator architecture (Tsai et al., 201 4; Wang et al., 201 4). The 
mechanisms by which Mediator influences transcription remain 
to be explored, but there is evidence for a conformational change 
in Mediator induced by binding of a transcription activator 
(Meyer et al., 2010; Taatjes et al., 2002). 

Transcription initiation is also regulated by methylation of DNA 
upstream of the promoter. In higher cells, this often occurs in 
CpG islands, which are DNA regions enriched for CpG dinucleo- 
tides. Hypermethylation of CpG islands generally leads to tran- 
scription repression. In a structure of a methyltransferase-DNA 
substrate complex, the target base was flipped out of the DNA 
double helix and inserted into the enzyme’s active site (Klima- 
sauskas et al., 1994). A human DNA methyltransferase uses 
additional domains to ensure that only hemimethylated CpG di- 
nucleotides undergo methylation after replication, as seen in the 
enzyme-DNA complex structure (Song et al., 201 1 ; Song et al., 
2012 ). 

How Chromatin Regulates Transcription 

To make chromatin accessible for transcription, remodeling 
complexes use ATP hydrolysis to change nucleosome position 
and structure. Many remodelers contain a Swi/Snf family 
ATPase that induces DNA translocation with respect to histones 
(Durr et al., 2005; Thoma et al., 2005) (Figure 1). A combination of 
structural techniques provided the architecture of chromatin re- 
modelers ISWIa (Yamada et al., 2011), INO80 (Saravanan et al., 
201 2; Tosi et al., 201 3), and SWR1 (Nguyen et al., 201 3) and indi- 
cated how these machines bind nucleosomes, although the 
structure of the “remodeled nucleosome state” has remained 
elusive. Remodeling complexes not only bind nucleosomes— 
they can also be regulated by nucleosomes allosterically (Clapier 
and Cairns, 2012; Hauk et al., 2010; Racki et al., 2014). 

Proper nucleosome assembly by histone chaperones is 
required to repress cryptic transcription that can produce aber- 
rant RNAs from nonpromoter regions. X-ray studies unraveled 



988 Cell 159 , November 20, 2014 ©2014 Elsevier Inc. 




Cell 



chaperone structures and showed that chaperones disrupt 
histone interfaces or mimic nucleosomal DNAto prevent promis- 
cuous histone interactions prior to their assembly into nucleo- 
somes (Elsasser et al., 2012; English et al., 2006; Hondele 
et al., 2013; Hu et al., 2011; Park and Luger, 2006). Certain as- 
sembly factors incorporate histone variants into nucleosomes 
at specific sites. The histone variant H2A.Z changes nucleosome 
structure at active promoters (Suto et al., 2000) and is removed 
by a specific chaperone (Obri et al., 2014), and nucleosomes 
containing the centromeric histone H3 variant are apparently de- 
stabilized compared to canonical nucleosomes (Tachiwana 
et al., 2011). 

Structural studies also revealed details on how proteins recog- 
nize nucleosomes. A structure of a nucleosome in complex with 
RCC1 , a regulator of chromatin condensation, showed how this 
protein recognizes both histones and DMA to specifically bind 
nucleosomes (Makde et al., 2010). Another nucleosome struc- 
ture in complex with the gene-silencing factor Sir3 indicated 
how protein-nucleosome interactions are regulated by modifica- 
tions in histone tails (Armache et al., 2011). 

How Chromatin Marks Function 

Covalent histone modifications provide another layer of gene 
regulation (Strahl and Allis, 2000). Histone modifications include 
acetylation, methylation, phosphorylation, and ubiquitination 
and can be associated with active transcription or repression. 
Enzymes that set or remove these marks are known as “writers” 
and “erasers,” respectively. The structure of the histone acetyl- 
transferase Gcn5 provided insights into how chromatin marks 
are written (Rojas et al., 1999) (Figure 1). The structure of a 
portion of the acetyltransferase p300 explained mutations asso- 
ciated with human cancers (Liu et al., 2008). The subunit orga- 
nization of the large acetyltransferase complex NuA4 and its 
mode of interaction with the nucleosome were revealed by elec- 
tron microscopy (Chittuluru et al., 2011). Crystal structures of 
histone methyltransferases revealed their mechanism and how 
specificity for histones was achieved (Kwon et al., 2003; Min 
et al., 2002; Wilson et al., 2002; Xiao et al., 2003; Zhang et al., 
2002 ). 

Crystallography also showed how eraser enzymes work. The 
structure of a bacterial homolog of a histone deacetylase (Finnin 
et al., 1999) was followed by structures of the NAD-dependent 
Sir2 enzyme (Avalos et al., 2002; Finnin et al., 2001; Min et al., 
2001) and eukaryotic zinc-dependent histone deacetylases (So- 
moza et al., 2004; Vannini et al., 2004). Histone demethylase 
structures of the LSD1 (Chen et al., 2006; Stavropoulos et al., 
2006; Yang et al., 2006) and JmJ (Ng et al., 2007) classes re- 
vealed the basis for substrate specificity. The structure of the 
four-subunit deubiquitination module of the SAGA complex pro- 
vided the basis for substrate specificity and activation of this 
eraser (Kohler et al., 2010; Samara et al., 2010). 

Histone marks can recruit proteins via specific “reader” do- 
mains. The bromodomain binds acetylated lysine residues, as 
observed for the factors P/CAF, Tafi, and Gcn5 (Dhalluin 
et al., 1999; Jacobson et al., 2000; Owen et al., 2000). The chro- 
modomain binds methylated lysines, as exemplified by the HP1 
chromodomain bound to a histone H3 peptide methylated at 
lysine-9 (Jacobs and Khorasanizadeh, 2002). The PHD finger 



domain also binds a methylated lysine residue by trapping it 
into an aromatic cage that is lined with residues mutated in can- 
cer (Li et al., 2006; Pena et al., 2006). The pockets in reader and 
eraser proteins were consequently explored for drug design (Fil- 
ippakopoulos et al., 2010). Multiple histone marks can be read 
by a single reader domain (Moriniere et al., 2009) or by combina- 
torial binding of readers (Jacobson et al., 2000; Tsai et al., 201 0; 
Xi et al., 2011). Multiple histone marks influence transcription 
activity, for example, via the initiation factor TFIID that binds his- 
tone tails with marks for active transcription (Vermeulen et al., 
2007). 

Toward Structural Cell Biology 

In the coming years. X-ray crystallography will likely continue to 
provide landmark structures that elucidate unknown mecha- 
nisms in chromatin and transcription biology. However, many 
proteins that function in chromatin and transcription are modular 
by design. To resolve the structure of flexible factors and tran- 
sient multicomponent complexes, crystallography will often be 
integrated with complementary techniques. Of particular impor- 
tance will be electron microscopy, which enables placement of 
crystal structures of complex components but can now also 
reach high resolution that enables building of atomic models 
(Kuhibrandt, 2014; Wong et al., 2014). Crosslinking and mass 
spectrometry (Gingras et al., 2007; Serpa et al., 2012) will be 
routinely used to derive the relative position of known structures 
and to support models obtained by a combination of electron mi- 
croscopy and crystallography. 

A central future challenge will be the analysis of molecular 
structures within their cellular context and of structural changes 
in space and time. Advanced light microscopy techniques can 
now resolve detailed structures of assemblies such as nuclear 
pore complexes (Szymborska et al., 2013) or the cytoskeleton 
(Xu et al., 2013). A combination of in vivo crosslinking with 
deep sequencing and computer simulations can elucidate the 
overall folding of chromosomes (Naumova et al., 201 3). Electron 
tomography provides three-dimensional images of the nuclear 
pore complex (Beck et al., 2004; Bui et al., 2013) or polysomes 
(Brandt et al., 2009). 

We are witnessing the advent of a new research field that may 
be referred to as structural cell biology. Structural biologists may 
soon tackle most fundamental questions in biology. What is the 
conformational space for genomes, and how is it explored and 
utilized during gene activation and cell differentiation? What is 
the three-dimensional structure of genes and how does it change 
during transcription? What is the dynamic architecture of tran- 
sient RNA assemblies with multiple proteins? Answers to these 
questions will require the development of new techniques and al- 
gorithms to bridge resolution gaps, to integrate structural data 
from multiple sources, and to embed structures into their biolog- 
ical context. 
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Since determination of the myoglobin structure in 1 957, X-ray crystallography, as the anchoring tool 
of structural biology, has played an instrumental role in deciphering the secrets of life. Knowledge 
gained through X-ray crystallography has fundamentally advanced our views on cellular processes 
and greatly facilitated development of modern medicine. In this brief narrative, I describe my 
personal understanding of the evolution of structural biology through X-ray crystallography — using 
as examples mechanistic understanding of protein kinases and integral membrane proteins — and 
comment on the impact of technological development and outlook of X-ray crystallography. 



Brief History 

When Wilhelm Roentgen discovered X-ray in 1895, he could not 
have imagined the powerful applications of X-ray diffraction on 
crystals of biological samples. Max von Laue showed X-ray 
diffraction pattern of crystals in 1912, and William Lawrence 
Bragg derived a general equation, known as the Bragg’s Law, 
to describe the founding principle of image formation by X-ray 
diffraction (Bragg, 1913) (Figure 1). James Sumner obtained 
the first crystal of jack bean urease in 1926 and showed the 
enzyme to be a protein (Figure 1). Max Perutz and John Kendrew 
decided to pursue crystal structures of proteins— hemoglobin 
and myoglobin — beginning in the 1940s at the Cavendish Labo- 
ratory, University of Cambridge. Their pioneering effort was 
encouraged by William Lawrence Bragg, who served as the Di- 
rector of the Cavendish Laboratory between 1938 and 1954. In 
1953, James Watson and Francis Crick, both employed at the 
Cavendish Laboratory, deduced a DNA double-helix model on 
the basis of X-ray fiber diffraction images of DNA generated by 
Rosalind Franklin (Watson and Crick, 1953). 

The entire biological research community was both excited 
and shocked to see the very first crystal structure of a macromol- 
ecule in 1 957— that of sperm whale myoglobin by John Kendrew 
(Kendrew et al., 1 958). The structure of myoglobin, initially deter- 
mined at 6 A resolution but quickly improved to 2 A (Kendrew 
et al., 1960), confirmed the a-helical conformation as proposed 
by Linus Pauling and Robert Corey (Pauling and Corey, 1951a, 
1951b, 1951c; Pauling et al., 1951). Kendrew’s success in struc- 
ture determination of myoglobin was indispensably assisted by 
Perutz’ solution to the phase problem — multiple isomorphous 
replacement through heavy atom soaks. Max Perutz presented 
his own X-ray structure on the larger protein hemoglobin at 
5.5 A (Perutz et al., 1 960) and took a few years to improve the res- 
olution to 2.8 A (Perutz et al., 1968a, 1968b). Kendrew founded 
the Journal of Molecular Biology and served as Editor-in-Chief 
for a number of years. Kendrew also helped establish the Euro- 
pean Molecular Biology Laboratory in Heidelberg and became 
its founding director. Perutz, on the other hand, founded and 



directed the MRC Laboratory of Molecular Biology (Figure 1). 
Notably, the double-helix structure of DNA was finally visualized 
in 1980 by the X-ray structure of a 12-base-pair palindromic 
DNA, known as the Dickerson dodecamer (Wing et al., 1980). 

DNA is the genetic material of almost all living matters, and 
proteins are the engines of life. Structural elucidation of DNA 
and protein is arguably the most important scientific discovery 
in the 20*^ century. Proposal of the double-helix structure of 
DNA has fundamentally changed our perception of life and has 
ushered in a new era of modern biology. Crystal structures of 
myoglobin and hemoglobin allowed us to link protein function 
to its chemical details. In many respects, the atomic details 
offered by X-ray crystallography allowed mechanistic under- 
standing of protein function, which marks the beginning of mo- 
lecular biology. Kendrew and Perutz have been fondly named 
fathers of molecular biology. 

Early crystallographic studies focused on abundant proteins, 
most often enzymes, from animal organs and tissues. Following 
the successes on myoglobin and hemoglobin, structural infor- 
mation was obtained for at least seven additional proteins in 
the 1960s, including the first enzyme hen egg white lysozyme 
(Blake et al., 1965), ribonucleases A and S (Kartha et al., 1967; 
Wyckoff et al., 1967), chymotrypsin (Matthews et al., 1967), 
papain (Drenth et al., 1968), carboxypeptidase A (Lipscomb 
et al., 1 969), and subtilisin (Wright et al., 1 969). These structures, 
together with those of many other enzymes in the 1970s and 
beyond, reveal the active site conformations and catalytic mech- 
anisms, which form the physical basis of molecular enzymology. 

The Protein Data Bank (PDB), a central repository for three- 
dimensional structural data of macromolecules, was established 
in 1971 at the Brookhaven National Laboratory with seven initial 
entries. As of August 26, 2014, there were 102,863 total entries 
in the PDB, of which 88.7% were determined by X-ray crystallog- 
raphy, 10.3% by nuclear magnetic resonance (NMR), and 0.8% 
by electron microscopy (EM) (Figure 2A). Following structure 
determination of the lysozyme from bacteriophage T4 (T4 lyso- 
zyme) (Matthews and Remington, 1974), it became a paradigm 
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Figure 1. The History of X-Ray Crystallog- 
raphy in the Eyes of Nobel Prizes 

Major achievements in the deveiopment and appii- 
cation of X-ray crystaiiography have been recog- 
nized by at ieast 14 Nobel Prizes. The first Nobel 
Prize in physics was awarded to Roentgen in 1901 
for his discovery of X-rays. The next two Nobel 
Prizes in 1914 and 1915 were given to Laue for his 
discovery of X-ray diffraction by crystals and to the 
Bragg father and son for the use of X-rays to 
determine crystal structure. Sumner was awarded a 
Chemistry Prize in 1946 for crystallization of the 
enzyme urease. Pauling won a Chemistry Prize in 
1954 for his research into the nature of chemical 
bond and its application in structure determination. 
The year 1 962 was quite special, with the Chemistry 
Prize awarded to Kendrew and Perutz for their 
pioneering work in protein structure determination, 
and the Physiology or Medicine Prize bestowed on 
Crick, Watson, and Wilkins for their contribution in 
the discovery of DNA structure. Hodgkin was 
awarded a Chemistry Prize in 1964 for structural 
elucidation of many biochemical substances, 
including vitamin B12. The next six Nobel Prizes 
were awarded to macromolecular crystallogra- 
phers: Deisenhofer, Huber, and Michel in 1988 for 
the structure of bacterial photosynthetic reaction 
center; Walker in 1997 for the structure of F1- 
ATPase; MacKinnon in 2003 for potassium chan- 
nels; Kornberg in 2006 for the structure of RNA 
polymerases; Ramakrishnan, Steitz, and Yonath in 
2009 for the structure of ribosome; and Kobilka in 
2012 for the structure of GPCR. Listed in the lower 
left corner are nine Nobel Prizes that are closely 
related to X-ray crystallography. 
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Figure 2. Entries in the Protein Data Bank 
Have Enjoyed Near-Exponential Growth in 
the Past 30 Years 

(A) The total number of entries of PDB. The PDB 
was established in 1971 at the Brookhaven Na- 
tional Laboratory with only seven entries. The total 
number of entries grew to 69 in 1 980, 507 in 1 990, 
13,597 in 2000, and 70,039 in 2010. As of August 
26, 2014, there were 102,863 total entries in the 
PDB, of which 88.7% were determined by X-ray 
crystallography, 1 0.3% by NMR, and 0.8% by EM. 
Some of the representative X-ray structures are 
indicated in the chart. 

(B) The total number of unique membrane protein 
structures in PDB. In sharp contrast to their heavy 
presence, structures of membrane proteins only 
account for just over 1 % of all entries in the PDB, 
with 1 ,520 total entries and 499 unique structures 
as of August 31, 2014 (http://blanco.biomol.uci. 
edu/mpstruc/). 



for the study of protein folding and thermodynamics (Baase 
et al., 201 0). In addition to the study of enzymes, viruses became 
hotly pursued in the 1960s and 1970s. Following his pioneering 
research into the tomato bushy stunt virus (Harrison, 1969; Har- 
rison and Jack, 1975), Stephen Harrison and colleagues re- 
ported the first virus structure at an atomic resolution of 2.9 A, 
revealing 180 copies of the capsid protein arranged in an icosa- 
hedral particle (Harrison et al., 1978) (Figure 2A). Aaron Klug and 
colleagues solved the X-ray structure of the TMV protein disk at 
2.8 A resolution (Bloomer et al., 1978) and obtained a low-reso- 
lution structure of nucleosome core particle (Finch et al., 1977; 
Richmond et al., 1984). 

Two exciting developments in biology— discovery of gene 
regulation in the 1960s and emergence of recombinant DMA 



technology in the late 1970s— raised the 
question of how DMA sequences could 
be specifically recognized by transcrip- 
tion factors. Seeking an answer to this 
question became an important theme 
for structural biology in the 1980s (for 
more details, please see Cramer [2014] 
in this issue of Cell). Many DNA-binding 
motifs were identified, and a number of 
crystal structures of protein-DNA com- 
plexes were elucidated (Aggarwal et al., 
1988; Anderson et al., 1987; Otwinowski 
et al., 1 988; Wolberger et al., 1 988). There 
were many other exciting developments 
in the 1980s. Michael Rossmann’s group 
determined the first atomic resolution 
crystal structure of a human common 
cold virus (Rossmann et al., 1985) (Fig- 
ure 2A). These early findings were fol- 
lowed up with a barrage of structural 
information on viruses, including mengo 
virus (Luo et al., 1987) and foot-and- 
mouth disease virus (Acharya et al., 
1989). Due to impending health threats, 
disease-causing viruses continue to 
attract ample attention from the structural biology community. 
Insights gained from these virus structures have, in turn, revolu- 
tionized our concepts of virology and helped vaccine design and 
drug discovery. The 1 980s also marks the beginning of structural 
biology on integral membrane proteins, with determination of 
atomic resolution X-ray structures of the bacterial photosyn- 
thetic reaction center (Allen et al., 1 987; Deisenhofer et al., 1 985). 

Compared to the 1980s, paces of structural biology were 
considerably faster in the 1 990s. The total number of PDB entries 
increased from less than 400 at the beginning of the year 1 990 to 
more than 13,000 by the end of the year 2000 (Figure 2A). Cellular 
signal transduction was the principal focus of biological investi- 
gation in the 1 990s; consequently, the central theme of structural 
biology during this period was mechanistic understanding of cell 
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signaling. Crystal structures were elucidated for numerous 
signaling motifs, exemplified by the phosphotyrosine recognition 
SH2 domain (Waksman et al., 1992), the phosphoinositol-bind- 
ing PH domain (Ferguson et al., 1995), the proline-rich sequence 
binding SH3 domain (Musacchio et al., 1992, 1994), and the 
carboxyl-terminal peptide-binding PDZ domain (Doyle et al., 
1996). The modular nature of these motifs and characterization 
of their binding specificities constitute an important basis for 
the development of synthetic biology and chemical biology. 

Signal transduction often begins in the extracellular space with 
ligand binding to its cell-surface receptor. Crystal structure of the 
complex between human growth hormone and the extracellular 
domain of its receptor revealed a single hormone molecule 
recognized by two molecules of receptor (de Vos et al., 1992) 
(Figure 2A). Some of the other early cocrystal structures include 
the extracellular domain of TNF receptor bound to TNF-p (Ban- 
ner et al., 1993), the extracellular fragment of interferon-y recep- 
tor bound to interferon-y (Walter et al., 1995), and domain 2 of 
the Flt-1 receptor bound to VEGF (Wiesmann et al., 1 997). These 
ligand-receptor structures give rise to the principle that receptor 
clustering induced by ligand binding serves as a platform for 
signaling, which usually involves phosphorylation by receptor ki- 
nases. Crystal structure of the catalytic subunit of protein kinase 
A (PKA)— the first fora protein kinase— revealed an amino-termi- 
nal lobe rich in p strands and a carboxyl-terminal lobe with 
mostly a helices (Knighton et al., 1991) (Figure 2A). The kinase 
fold observed in PKA was soon confirmed by dozens of crystal 
structures of other important protein kinases in the 1990s. Un- 
derstanding of immune signaling by the T cell receptor (TCP) 
is markedly enhanced by structures of TCP and their complexes 
with MHC-peptide antigens (Garboczi et al., 1996; Garcia et al., 
1996) (Figure 2A). 

Structural biology of membrane proteins enjoyed rapid devel- 
opment in the 1990s and beyond. As an extension of X-ray crys- 
tallography, electron crystallography was successfully applied 
to determine the structure of membrane proteins that formed 
two-dimensional crystals at low resolution in the 1980s and 
near-atomic resolution in the 1990s. Following two decades of 
method development on electron crystallography (Subramaniam 
et al., 2002), Pichard Henderson and colleagues succeeded in 
the elucidation of a 3.5 A resolution structure of bacteriorho- 
dopsin using electron diffraction data (Grigorieff et al., 1996). 
Kuhibrandt and colleagues generated a 3.4 A resolution struc- 
ture of a plant light-harvesting complex (Kuhibrandt et al., 
1994). A striking 1.9 A resolution was achieved for two-dimen- 
sional aquaporin-0 crystals, which allowed clear visualization 
of lipid-protein interactions (Gonen et al., 2005). 

X-ray structure of the mitochondrial FI-ATPase at 2.8 A res- 
olution revealed distinct conformations for the three catalytic p 
subunits (Abrahams et al., 1994). Structural analysis of a bacte- 
rial potassium channel— the first structure of a recombinant in- 
tegral membrane protein — provided insights into potassium 
conduction and selectivity (Doyle et al., 1998). This work was 
followed up with systematic structural studies of potassium 
channels, which offer molecular explanations for gating of the 
ion-conducting pore (MacKinnon, 2003). X-ray structure of the 
bacterial large-conductance mechanosensitive channel (MscL) 
revealed a homopentameric assembly and served as a founda- 



tion for understanding other mechanosensitive channels (Chang 
et al., 1998) (Figure 2B). Structural analysis of the water channel 
aquaporin-1 identified a tetrameric assembly with water mole- 
cules localized along a selectivity filter (Figure 2B) (Sui et al., 
2001). Crystal structure of the AMPA-type GluA2 receptor re- 
vealed a 2-fold symmetry in the extracellular domain and a 
4-fold symmetry in the membrane-spanning ion channel domain 
(Sobolevsky et al., 2009). The first crystal structures on recom- 
binant membrane transporters were elucidated for the lactose 
permease LacY (Abramson et al., 2003) and the glycerol-3- 
phosphate transporter GIpT (Huang et al., 2003), defining a 
conserved fold for the major facilitator superfamily (MFS) of sec- 
ondary active transporters (Figure 2B). Crystal structure of the 
human glucose transporter GLUT1— the first eukaryotic MFS 
structure— allows rationalization of disease-derived mutations 
(Deng et al., 2014). Following an early X-ray structure on bovine 
rhodopsin (Paiczewski et al., 2000), crystal structures of the re- 
combinant G-protein-coupled receptor (GPCR) were reported 
on p2 adrenergic receptor (Cherezov et al., 2007; Rasmussen 
et al., 2007). Structure of an agonist-bound 32 adrenergic re- 
ceptor in complex with a nucleotide-free Gs heterotrimer 
provided a preliminary answer to how ligand binding to GPCR 
triggers the activation of G protein in the cytoplasm (Rasmussen 
et al., 2011). 

Improvement of hardware and software in X-ray crystallog- 
raphy has greatly accelerated the pace of discovery and em- 
boldened structural biologists to attack challenging research 
projects. Structural elucidation of the proteasomal 20S particles 
revealed the architecture and identified the proteolytic mecha- 
nisms (Groll et al., 1997; Lowe et al., 1995; Seemuller et al., 
1 995) (Figure 2A). Crystal structure of the GroES-GroEL complex 
markedly improved our understanding of chaperone-assisted 
protein folding (Xu et al., 1997b), whereas structural analysis of 
the nucleosome core particle shows in atomic detail how 146 
base pairs of DMA are assembled around the histone octamer 
(Luger et al., 1 997) (Figure 2A). Crystal structure of the large ribo- 
somal subunit, including 2833 RNA nucleotides and 27 proteins, 
was determined at an atomic resolution of 2.4 A (Ban et al., 2000) 
and expanded our protein-RNA recognition database as of the 
year 2000 by a factor of 6-fold. Since the early 1 990s, biologically 
important megacomplexes and macromolecular assemblies 
have represented increasingly attractive targets for structural 
biologists. X-ray structures of the eukaryotic exosomes have re- 
vealed important insights into the degradation and processing of 
cellular RNA (Bonneau et al., 2009; Liu et al., 2006; Makino et al., 
2013; Wasmuth et al., 2014). Preliminary structural analysis of 
the U1 , U4, and U6 snRNPs, three subcomplexes of the eukary- 
otic spliceosome, provided mechanistic insights into mRNA 
slicing (Leung et al., 2011; Montemayor et al., 2014; Pomeranz 
Krummel et al., 2009; Zhou et al., 2014). Crystal structure of 
the human COP9 signalosome revealed molecular architecture 
of the eight-subunit complex (Lingaraju et al., 2014). The combi- 
nation of X-ray crystallography with single-particle cryo-EM 
analysis has been routinely used to provide accurate information 
on megacomplexes, as exemplified by structure of the yeast 26S 
proteasome (Beck et al., 2012). 

The vast amount of information in the PDB serves as the struc- 
tural basis for understanding biology and innovating medicine. 
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X-ray crystallography as a tool has played a dominant role in the 
past five decades in deciphering the molecular mechanisms of 
virtually all biological processes. It is truly a mission impossible 
to attempt coverage of, even if just glossing over, all major 
achievements of structural biology. Because this narrative is tar- 
geted to a general readership, the goal is to showcase the spirit 
of structural biology through brief description of representative 
examples— how X-ray crystallography has helped transform 
the understanding of biology. Therefore, I will mainly comment 
on two areas of biological research that have been galvanized 
by X-ray crystallography: kinase regulation and membrane pro- 
tein structure-function relationship (for the areas of chromatin 
and transcription, please see Cramer [2014] in this issue of 
Cell). Within each research area, the focus is not to provide a 
comprehensive review but to discuss pioneering structural 
investigation and select follow-up studies. 

Kinases and Anticancer Drugs 
cAMP-Dependent PKA 

Reversible protein phosphorylation was discovered on glycogen 
phosphorylase in the 1950s by Edwin Krebs, Edward Fischer, 
Earl Sutherland, and Wosilait (Fischer and Krebs, 1955; Suther- 
land and Wosilait, 1955). The first essential step in glycogen 
metabolism — its phosphorylation — is mediated by glycogen 
phosphorylase, whose activation depends on phosphorylase 
kinase. The conversion of inactive phosphorylase kinase to its 
activated form is mediated by phosphorylase kinase kinase, 
also known as cyclic AMP (cAMP)-dependent protein kinase, 
or PKA. The catalytic subunit of PKA in its free form is catalyti- 
cally active; however, two molecules of the catalytic subunits 
are constitutively bound and inhibited by a homodimer of regula- 
tory subunits. Binding of the second messenger cAMP to the 
regulatory subunits results in the dissociation and hence activa- 
tion of the PKA catalytic subunits. 

Human genome is predicted to encode 51 8 putative protein ki- 
nases, which, on the basis of substrate specificity, are classified 
into 90 tyrosine kinases (PTKs) and 428 serine/threonine kinases 
(PSKs). These kinases play an essential role in virtually all cellular 
processes. The very first atomic view on any protein kinase was 
provided in 1991 by the crystal structure of the catalytic subunit 
of PKA (Knighton et al., 1991). This structure reveals a general 
architecture of two lobes, with the substrate-binding site and 
the ATP-binding pocket both located between the two lobes 
(Figure 3A). An activation helix (aC), which contains a catalytic 
triad residue, and an activation loop, whose phosphorylation re- 
sults in the activation of the kinase, are positioned next to the 
hinge between the two lobes. These structural features become 
the hallmarks of nearly all protein kinases. 

Structures of the catalytic subunit of PKA explain substrate 
specificity and define conformational flexibility (Knighton et al., 
1991). Structural analysis of the regulatory subunit reveals mech- 
anism of cAMP recognition (Su et al., 1995). Subsequent struc- 
tural analysis of PKA with both catalytic and regulatory subunits 
provides a molecular explanation for the inhibition of PKA and 
identifies a model for cAMP-mediated activation through binding 
to the regulatory subunit (Figure 3B) (Kim et al., 2005; Wu et al., 
2007). A 2.3-A X-ray structure of the intact, tetrameric PKA 
holoenzyme yields insights into allosteric regulation and has 



ramifications for understanding the regulation of other con- 
served kinases (Figure 3C) (Zhang et al., 2012). 

CycUn-Dependent Kinases 

The temporal-spatial activities of cyclin-dependent kinases 
(CDKs) determine the fate of the cell cycle and thus are subject 
to complex regulation. Unlike most kinases, which are active in 
their free forms, free CDKs are inactive and only attain a basal 
level of catalytic activity upon association with cyclin. The basal 
activity can be greatly stimulated by phosphorylation of the acti- 
vation loop (also known as T loop). The Cip/Kip family of cell-cy- 
cle inhibitors, exemplified by p27, potently inhibits the kinase 
activity of the assembled cyclin-CDK complexes, whereas mem- 
bers of the INK4 inhibitor family can bind the inactive, free CDK, 
thus preventing its activation. The INK4 inhibitors, exemplified by 
p16INK4a, also bind and inhibit active cyclin-CDK complexes. 

Crystal structure of free CDK2 reveals a misaligned confor- 
mation for the activation helix (PSTAIRE helix) and the T loop, 
providing an immediate explanation to the lack of catalytic activ- 
ity for the free kinase (De Bondt et al., 1 993). Binding by cyclin A 
induces large conformational changes in the PSTAIRE helix and 
T loop, resulting in correct alignment of the active site residues 
and opening of the catalytic cleft (Jeffrey et al., 1995). Phosphor- 
ylation of CDK2 in the T loop results in the reorganization of the 
local structural elements, priming it for substrate binding (Russo 
et al., 1996b). Remarkably, the inhibitory domain of p27^'P^ 
adopts an extended conformation to interact with both cyclin A 
and CDK2, with an invariant residue inserting into the catalytic 
cleft of CDK2 and displacing ATP (Russo et al., 1996a). By 
contrast, the INK4 inhibitors pi 6"^'^'^^ and pi 9"^'^^^ bind close 
to the ATP-binding site of CDK6 and cause structural changes 
in the catalytic cleft to negatively affect the kinase activity (Broth- 
erton et al., 1998; Russo et al., 1998). The binding site for 
pi or pi jg opposite of where cyclin binds, explaining 
why the INK4 inhibitors can bind both free and cyclin-bound 
CDKs. 

Advent of structural information on CDKs greatly facilitated the 
screening and design of specific small-molecule inhibitors that 
block their kinase activities. These inhibitors, mostly ATP ana- 
logs, effectively stopped cancer cell growth in petri dish and in 
animal models. The outcome of anticancer clinical trials thus 
far has been less than desirable, with notable toxicity and mar- 
ginal efficacy. For example, flavopiridol exhibited multifaceted, 
antiproliferative effects in preclinical research but had disap- 
pointing performance in clinical trials with narrow therapeutic 
window and off-target effects. The more selective, second gen- 
eration CDK inhibitor dinaciclib has entered phase III clinical trial. 
A notable advance is the development of palbociclib, a highly se- 
lective inhibitor of CDK4 and CDK6, for treatment of the HER2-/ 
ER+ breast cancer. In addition, the problem of cross-reactivity 
by ATP analogs can be effectively dealt with by the development 
of inhibitors that target sites other than the ATP-binding pocket 
(Abate et al., 2013). 

BCR-ABLj EGFR, and Other Kinase Targets 

BCR-ABL is the fusion product of chromosome translocation — a 
portion of the Bcr gene on chromosome 22 is fused to the Abl 
tyrosine kinase gene on chromosome 9 in the Philadelphia chro- 
mosome. BCR-ABL is constitutively activated and drives the 
development of chronic myelogenous leukemia (CML). Perhaps 
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the most celebrated, anticancer kinase inhibitor is imatinib, also tions (except T315I). 



guided subsequent development of sec- 
ond-generation inhibitors that stabilize 
these kinase-specific inactive conforma- 
tions. These inhibitors include ponatinib, 
which is effective against the T31 5I muta- 
tion, and nilotinib (sometimes known as 
super gleevec. Figure 3D), which works 
well against the majority of CML muta- 
The structural information also helped 
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Figure 3. Kinase Structure, Regulation, and 
Inhibitors 

(A) Crystal structure of the catalytic subunit of 
protein kinase A (PKA, PDB code 1ATP, colored 
green). The activation helix (aC helix) in the N lobe 
is colored purple, and the activation loop (T loop) is 
highlighted in red. ATP and phosphothreonine 198 
(Thr198) in the T loop are indicated. The substrate 
analog inhibitor is represented in gray. 

(B) Crystal structure of the complex between PKA 
catalytic subunit and regulatory subunit (PDB 
code 3FHI). The PKA catalytic and regulatory 
subunits are colored green and blue, respectively. 
The ATP analog AMP-PNP is shown in sticks. The 
inhibitory region of the regulatory subunit is rep- 
resented in gray. 

(C) Mechanism of cAMP-mediated activation of 
PKA. In the absence of cAMP, two catalytic sub- 
units (green) and two regulatory subunits (blue) 
assemble into an inactive, tetrameric PKA holo- 
enzyme (PDB code SNTP). The binding of cAMP to 
the regulatory subunits causes pronounced 
conformational changes, leading to dissociation of 
the catalytic subunits and producing a dimeric 
cAMP-bound regulatory subunits (PDB code 
4MX3) and two free active catalytic subunits (PDB 
code 1ATP). 

(D) Small-molecule inhibitors of BCR-ABL as 
potent anticancer drugs. Crystal structure of the 
human ABL1 kinase domain bound to the anti- 
cancer drug imatinib (PDB code 2HYY) is shown in 
the left panel. Imatinib is a potent drug targeting 
CML through binding and inhibition of BCR-ABL. 
Two other inhibitors, nilotinib and dasatinib, are 
used to treat CML patients with imatinib-resistant 
mutations. The three small molecule inhibitors are 
shown in the right panel. 



tures reveal the precise interactions of 
an imatinib variant with surrounding resi- 
dues in the ATP-binding pocket of BCR- 
ABL and explain how patient-derived 
mutations (such as T315I) inactivate ima- 
tinib binding (Figure 3D) (Nagar et al., 
2002; Schindler et al., 2000). Imatinib, 
classified as a type II kinase inhibitor, 
binds the kinase in its inactive DFG-out 
conformation. The structural information 



known as gleevec or STI-571 , which targets BCR-ABL (Shah and 
Sawyers, 2003). Nearly all chronic myelogenous leukemia (CML) 
patients who took the drug in the first phase I trial in 1998 were 
responding; imatinib was approved by FDA in 2001. Imatinib 
also potently inhibits the tyrosine kinases c-kit and PDGFR, 
and its clinical use has been approved for a few other cancer 
types associated with c-kit or PDGFR activation. Crystal struc- 



design of the type I kinase inhibitors, exemplified by dasatinib 
(Figure 3D), which bind the kinase in its active DFG-in confor- 
mation. These second-generation inhibitors are used to treat 
relapsed CML patients with imatinib-resistant mutations. 

The epidermal growth factor receptor (EGFR) family of recep- 
tor tyrosine kinases (RTK) has four members ErbB1-4, each with 
an extracellular ligand-binding domain, a single transmembrane 
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segment, and an intracellular kinase domain. The activation 
mechanism of EGFR has been elegantly revealed by crystallo- 
graphic discovery and subsequent biochemical analysis 
(Burgess et al., 2003; Zhang et al., 2006). Growth factor binding 
induces conformational changes in the extracellular domain, 
leading to its dimerization with the two ligands on opposite sides 
of the heterocomplex (Garrett et al., 2002; Ogiso et al., 2002). The 
EGFR kinase domain exists in an autoinhibited conformation, 
with key residues Leu834 and Leu837 stabilizing the inactive 
conformation of the aC helix. Receptor dimerization, induced 
either by ligand binding or by high protein concentrations during 
crystallization, facilitates formation of an asymmetric dimer, 
where the C-lobe of the activator kinase interacts with the 
N-lobe of the receiver kinase, leading to allosteric activation of 
the latter (Zhang et al., 2006). The structural observations also 
provide a satisfying explanation to how mutations of Leu834/ 
Leu837 drive EGFR activation. Aberrant activation of EGFR con- 
tributes to the development of a number of malignant cancer 
types. A few small-molecule inhibitors of EGFR have been 
used in the clinic to treat cancers, such as gefitinib (popularly 
known as Iressa) for treating non-small-cell lung cancer 
(NSCLC). Patients with activating mutations in EGFR respond 
very well to the treatment of gefitinib (Lynch et al., 2004; Paez 
et al., 2004). Other small-molecule inhibitors targeting EGFR 
include lapatinib for breast cancer and erlotinib (or Tarceva) for 
NSCLC and pancreatic cancer. 

Monoclonal antibodies have also been developed to prevent 
ligand binding to EGFRs or to sabotage dimerization of the extra- 
cellular domain. Herceptin, or trastuzumab, which blocks ligand 
binding by associating with the extracellular domain of ErbB2/ 
HER2 (Cho et al., 2003), proved to be effective in treating 
ErbB2-overexpressing breast cancer (Recondo et al., 2014). 
Pertuzumab, on the other hand, prevents receptor dimerization 
by binding to the extracellular domain of ErbB2/HER2 (Franklin 
et al., 2004). Cetuximab, or Erbitux, which prevents both ligand 
binding and receptor dimerization through binding to domain III 
of the EGFR extracellular region (Li et al., 2005), has been 
approved for the treatment of metastatic colon cancer, NSCLC, 
and head and neck cancer. More recently, monoclonal antibody 
and cytotoxic small molecule have been combined into a single 
entity, named antibody-drug conjugates, which directly target 
cancer cells with high-dose chemotherapy. Trastuzumab em- 
tansine (or Kadcyla) is such an antibody-drug conjugate that 
combines trastuzumab and the cytotoxic agent mertansine; it 
is approved to treat HER2-positive metastatic breast cancer 
(Recondo et al., 2014). 

Chemical Genetics of Protein Kinases 
The structural information not only facilitated drug discovery 
targeting various malignancies but also gave birth to chemical 
genetics on kinases. The proto-oncogene c-Src, discovered by 
J. Michael Bishop and Harold Varmus, exists in normal cells in 
an inactive conformation. In the crystals of autoinhibited c-Src, 
the SH3 domain, the SH2 domain, and the kinase domain 
interact with each other to assemble into a tightly folded assem- 
bly in which the phosphorylated Tyr527 binds to the SH2 domain 
and locks c-Src in an inactive conformation (Sicheri et al., 1997; 
Xu et al., 1997a). Activation of the c-Src pathway, triggered by 
dephosphorylation of Tyr527, contributes to multiple cancer 



types. On the basis of structural modeling, the ATP-binding 
site of v-Src was mutated such that the resulting v-Src, but not 
the wild-type (WT) v-Src, could accept a synthetic ATP analog 
(Shah et al., 1997). The engineered v-Src displayed similar cata- 
lytic efficiency, as well as substrate specificity as the WT v-Src, 
but allowed direct tracing of v-Src substrates in cells (Shah et al., 
1997). This strategy was applied to other Src family members 
and general protein kinases. The engineered kinases, along 
with the unique ATP analogs, allowed investigation of important 
biological questions, such as identification of Cdkl targets in 
yeast, selective inhibition of neurotrophin in vivo, and discovery 
of JNK2 as a positive regulator of c-Jun. 

Membrane Protein Structure and Function 

Membranes compartmentalize cellular processes and enzy- 
matic reactions, and membrane proteins account for 20%- 
30% of all human proteins. Structures of membrane proteins, 
however, only account for just over 1% of all entries in the 
PDB, with 1 ,520 total entries and 499 unique structures as of 
August 31 , 201 4 (Figure 2B). The structural investigation of mem- 
brane proteins had been hampered by the technical challenges 
of poor recombinant expression, insolubility in aqueous solution, 
and unruly behavior in detergent solubilized forms. In addition, 
crystals of membrane protein usually diffract X-rays poorly. 
Consequently, structural biology of membrane proteins lagged 
behind that for soluble proteins by two decades. 

Early effort focused on endogenous membrane proteins, 
eliminating the hassles of recombinant expression. Crystal 
structure of a bacterial photosynthetic reaction center at 3 A 
resolution— the first atomic-resolution image of any integral 
membrane protein — reveals the stunning inner workings usually 
buried within the membrane (Deisenhofer et al., 1985) (Figure 1). 
X-ray structure of the FI-ATPase from bovine heart mitochon- 
dria, determined at 2.8 A resolution, captured the three catalytic 
P subunits in distinct conformations and different states of 
nucleotide binding (Abrahams et al., 1994) (Figure 1). This struc- 
tural observation lends critical support to the hypothesis that 
the three catalytic subunits are at different states of the cata- 
lytic cycle at any moment, and rotation of the a3p3 subcomplex 
relative to the y subunits results in the interconversion of the 
states. 

Roderick McKinnon was among the first to use recombinant 
system to express membrane proteins— potassium channels— 
for structural studies (Figure 1). Despite similar properties be- 
tween potassium (K^ and sodium (Na""), channels are at least 
four orders of magnitude more permeant for K"^ than for Na"*". 
Crystal structure of the channel from Streptomyces lividans 
(KcsA) at 3.2 A resolution revealed the first atomic view of an 
ion channel (Figure 4A) (Doyle et al., 1998). Tetrameric arrange- 
ment of the selectivity filter and structural features of the 
pore provide explanations to ion selectivity and conduction 
(Doyle et al., 1998; Roux and MacKinnon, 1999; Zhou et al., 
2001). Only two out of four potential K"^ binding sites are occu- 
pied in the selectivity filter, allowing energy minimization and 
optimization of conduction rate (Morais-Cabral et al., 2001). 
Subsequent structural and biochemical studies on K^ channels 
uncovered mechanistic insights into gating of the ion-conducting 
pore (Jiang et al., 2002, 2003; Long et al., 2005). 
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Figure 4. Structures and Mechanisms of Representative Channels 

(A) Crystal structure of the potassium channel KcsA (PDB code: 1 BL8). KcsA is 
a homotetramer. Each subunit contains two a helices connected by the pore 
region, which harbors the selectivity filter. The selectivity filter is shown in a 
close-up view, with three ions (magenta spheres) identified. Two ions are 
in a rapid equilibrium, with only one of the lower two positions occupied at any 
instant. One water molecule is shown in red sphere. 

(B) Crystal structure of the water channel aquaporin 1 (AQP1) (PDB code: 
1 J4N). AQP1 is a tetramer with each subunit providing an independent water 
pore. Four water molecules (red spheres) are located at three hydrophilic 
nodes along the selectivity filter. 

(C) Crystal structure of the mechanosensitive channel MscL (PDB code: 
20AR). MscL is organized as a homopentamer. A water-filled opening from the 
extracellular side narrows at the cytoplasmic side, where hydrophilic residues 
may function as the selectivity filter. 

(D) Crystal structure of the AMPA-type glutamate receptor GluA2 (PDB code: 
3KG2). The GluA2 structure is a homotetramer. In the antagonist-bound 
structure, the ion channel adopts a closed conformation. Residues forming the 
narrowest region are indicated in the close-up view. 



G-Protein-Coupled Receptors 

GPCRs define a large family of seven transmembrane proteins 
that mediate a wide range of signaling at the plasma membrane. 
Approximately half of all clinical drugs directly target GPCRs. 
Working with Robert Lefkowitz, Brian Kobilka cloned and bio- 
chemically characterized human a2- and p2-adrenergic receptors 
(Kobilka et al., 1987a, 1987b, 1988). X-ray structure of bacterio- 
rhodopsin, which bears homology to mammalian GPCR, was 
determined in 1997 (Pebay-Peyroula et al., 1997), followed by 
the structure determination of bovine rhodopsin (Paiczewski 
et al., 2000). Conformation of the seven transmembrane helices 
(TMs) in bovine rhodopsin differs significantly from that in bacte- 
riorhodopsin. Kobilka and colleagues determined the crystal 
structure of the human P2 adrenergic receptor (|32AR) at 3.4- 
3.7 A resolution (Rasmussen et al., 2007). The relatively poor 
X-ray diffraction of (32AR crystals was successfully mitigated by 
insertion of T4 lysozyme into the third intracellular loop (Rose- 
nbaum et al., 2007). Crystal structure of the resulting p2AR bound 
to a diffusible ligand carazolol was determined at 2.4 A resolution, 
revealing extensive interactions of carazolol with residues at 
the ligand-binding site (Cherezov et al., 2007). Structures of 
activated and/or agonist-bound, as well as antagonist-bound, 
GPCRs reveal distinct conformations of the ligand-binding 
pocket. The most notable ligand-induced conformational change 
on the cytoplasmic side appears to be an outward movement of 
the cytoplasmic portion of TMS and TM6. A wealth of rapidly 
emerging structures on GPCRs has greatly stimulated the inter- 
ests of major pharmaceutical companies to improve existing 
drugs and to screening and design new therapeutic modulators. 

The principal biological question on GPCR is how conforma- 
tional changes triggered by ligand binding result in the activation 
of G protein. A tentative answer to this question was supplied by 
the crystal structure of an agonist-bound p2AR in complex with a 
nucleotide-free Gs heterotrimer (Rasmussen et al., 2011) 
(Figure 1). The most pronounced, agonist-induced conforma- 
tional change in p2AR is a 14 A outward movement at the cyto- 
plasmic end of TM6 and TMS. The conformational changes 
induced by the interactions between p2AR and Gs are propa- 
gated to the nucleotide-binding pocket, presumably facilitating 
replacement of GDP by GTP. The most unanticipated change 
is a marked displacement of the a-helical domain of Gas relative 
to the Ras-like GTPase domain (Rasmussen et al., 2011). 
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Membrane Transport Proteins 

A living cell constantly needs to uptake nutrients from the envi- 
ronment and to expel metabolites and waste materials. This 
extremely complex process is mediated by a very large number 
of membrane transport proteins that can be classified into at 
least four general types: channels or pores, facilitators, second- 
ary active transporters, and primary active transporters. An on- 
line database of membrane transport proteins (www.TCDB. 
org) contains more than 10,000 unique protein sequences that 
are classified into over 800 transporter families. Membrane pro- 
teins of known three-dimensional structure are cataloged online 
at http://blanco.biomol.uci.edu/mpstruc/. I do not intend to 
comprehensively cover all known structures of transporters. 
Rather, I will simply give a few representative examples. 

Channels. The aquaporin family of channels, consisting of 
the water-conducting aquaporins and the glycerol-conducting 
aquaglyceroporins, plays an essential role in the regulation of 
cellular osmolarity. Structural analysis of aquaporin-1 (AQP1) re- 
vealed a tetrameric assembly and identified four water mole- 
cules that are localized at three hydrophilic nodes along an 
otherwise highly hydrophobic selectivity filter (Figure 4B), sup- 
porting rapid water transport (Sui et al., 2001). Subsequent struc- 
tural analysis of aquaporin Z identified the molecular mechanism 
for differentiation between water and glycerol by the aquaporin 
family (Savage et al., 2003). Structure of the glycerol facilitator 
GIpF revealed an amphipathic selectivity pore that is lined by a 
number of glycerol molecules in single file (Fu et al., 2000). Struc- 
tural analysis and molecular dynamics simulation elucidated the 
molecular mechanism of selective permeability for glycerol (Fu 
et al., 2000; Tajkhorshid et al., 2002). The aquaporin fold is 
also observed in other small-molecule transporters such as the 
pentameric formate channel FocA (Waight et al., 2010; Wang 
et al., 2009). 

Physical forces, in the form of touch, hearing, pressure, and 
gravity, are primarily sensed by a family of mechanosensitive 
ion channels, which transduce mechanical strain into an electro- 
chemical response. The X-ray structure of the MscL, determined 
at 3.5 A resolution, revealed a homopentamer (Figure 4C) (Chang 
et al., 1998). In each subunit, a water-filled opening at the extra- 
cellular side is followed by a hydrophilic pore that narrows to an 
occluded region at the cytoplasmic side. This structure serves as 
a model system for understanding of and comparison with other 
mechanosensitive channels. Structural analysis of the small- 
conductance mechanosensitive channel (MscS) shows a hepta- 
meric assembly, with the closed-state transmembrane pore 
connecting to a large chamber in the cytoplasmic side (Bass 
et al., 2002). Notably, the overall fold in the transmembrane re- 
gion is different between MscL and MscS, with two TMs in 
MscL and three TMs in MscS. 

Chemical transmitters of excitatory synapses in the central 
nervous system, exemplified by glutamate, activate receptors 
on the postsynaptic cells, leading to transmission of signals 
from one neuron to the next. Ionotropic glutamate receptors 
(iGluRs) are ligand-gated ion channels. The iGluR family includes 
AMPA, kainite, and NMDA receptors, which are heterotetrameric 
or homotetrameric. Crystal structure of the homotetrameric 
AMPA-type GluA2 receptor bound to a competitive antagonist 
revealed a closed conformation, with a 2-fold symmetry in the 



extracellular domain and a 4-fold symmetry in the membrane- 
spanning ion channel domain (Figure 4D) (Sobolevsky et al., 
2009). Structural analysis of the GluA2 receptor allowed propo- 
sition of mechanisms for ion channel activation, desensitization, 
and inhibition by noncompetitive antagonists. These mecha- 
nisms were further investigated by the X-ray structure of GluA2 
bound to cone snail toxin (Chen et al., 2014) and in distinct func- 
tional states (Durr et al., 2014). Structure of the NMDA-type, het- 
erotetrameric receptor involving two GluNI and two GluN2B 
subunits confirmed some of the proposed general mechanisms 
for iGluRs and provided additional insights (Karakas and Furu- 
kawa, 2014; Lee et al., 2014). 

Secondary Active Transporters. Unlike channels, transporters 
never allow simultaneous substrate access from both sides of 
the lipid membrane. A prevailing model for the general transport 
mechanism is known as alternating access (Jardetzky, 1966), 
which postulates that a transporter must switch between at least 
two conformations: open to the extracellular side (outward- 
open) for substrate upload and open to the intracellular side (in- 
ward-open) for substrate release, or vice versa. Restrained by 
a series of conformational switches within each transport cycle, 
a transporter can only move 200-50,000 substrate molecules 
per second, considerably slower than that for channels (which 
sometimes can permeate substrates at near diffusion limit). 
The ability to transport substrate molecules against their con- 
centration gradients, together with the different conformational 
states within each transport cycle, have made membrane trans- 
porters appealing for structural and mechanistic investigations. 

The major facilitator superfamily (MFS) transporters are ubiq- 
uitously present in all kingdoms of life and play an important 
role in numerous cellular processes. The first crystal structures 
on MFS were reported on the lactose permease LacY from 
E. coll (Abramson et al., 2003) and the glycerol-3-phosphate 
transporter GIpT (Huang et al., 2003) (Figure 5A). The structures 
revealed a conserved MFS fold of 2-fold symmetric N-domain 
and C-domain, each comprising six consecutive TMs. The trans- 
port path, as well as the substrate-binding site, is located be- 
tween the N and C domains. The structural information provides 
immediate explanation to a large body of biochemical and 
biological observations, particularly on LacY (Kaback, 2005). 
Distinct conformational states of MFS, which include the multi- 
drug transporter EmrD (Yin et al., 2006), the L-fucose:H^ sym- 
porter FucP (Dang et al., 2010), the peptide transporters PepTso 
and PepTst (Newstead et al., 201 1 ; Solcan et al., 2012), and the 
D-xylose:H'^ symporter XylE (Sun et al., 2012) appear to support 
the alternating access model. Perhaps the best known and most 
important MFS family members are the glucose transporters 
GLUT1-4, which are associated with a number of debilitating 
diseases. Crystal structure of the full-length human GLUT1 — 
the first eukaryotic MFS structure— allows rationalization of dis- 
ease-derived mutations (Figure 5B) (Deng et al., 2014). 

Crystal structure of the sodium:leucine symporter LeuT 
revealed a conserved fold of ten TMs grouped into two inverted 
repeats, each containing five consecutive TMs (Figure 5C) 
(Yamashita et al., 2005). The first TM in each of the two inverted 
repeats, TM1 or TM6, is discontinuous and contains a highly 
conserved unwound segment that is positioned in the trans- 
port path. A similar fold was subsequently observed in the 



Cell 159, November 20, 201 4 ©201 4 Elsevier Inc. 1 003 




Cell 




Figure 5. Structures and Mechanisms of 
Representative Secondary Active Trans- 
porters 

(A) Crystal structure of the E. coli lactose 
permease LacY. The overall structure of LacY is 
shown in two perpendicular views (left and mid- 
dle). The transport path is highlighted by a red 
circle in the middle panel, and its close-up view is 
shown in the right panel. Residues involved in 
substrate binding and proton coupling are colored 
magenta and red, respectively. 

(B) Crystal structure of the human glucose trans- 
porter GLUT1. Two amino acids forming the 
extracellular gate are labeled in the close-up view. 

(C) Crystal structure of the sodium symporter 
LeuT. ArgSO and Asp404 are thought to play a key 
role at the extracellular gate, whereas Arg5 and 
Asp369 control the intracellular gate. These four 
amino acids are indicated in the close-up view. 



D369 



nucleobase:cation symporter-1 family Mhp1 (Weyand et al., 
2008) and the solute sodium symporter family vSGLT (Faham 
et al., 2008). Remarkably, despite lack of sequence or functional 
conservation, the same general LeuT fold has been recognized in 
several membrane transporter families, exemplified by the glyci- 
ne:betaine transporter BetP (RessI et al., 2009), and amino acid 
antiporters AdiC and GadC (Fang et al., 2009; Gao et al., 2009, 
2010; Ma et al., 2012). These unanticipated structural findings 
led to reclassification of membrane transporters, with nine 
LeuT-fold families now grouped together to constitute the ARC 
superfamily (Saier et al., 2009). 

Primary Active Transporters. Primary active transporters, 
exemplified by the sarco/endoplasmic Ca^'^-ATPase (SERCA) 
and the ATP binding cassette (ABC) transporters, exploit the en- 
ergy of ATP binding and hydrolysis for substrate transport, usu- 
ally against concentration gradient of the substrate molecules. 
The 2.6 A resolution crystal structure of SERCAIa revealed a 
complex architecture, with two calcium ions bound in the mem- 
brane spanning region (Figure 6A) (Toyoshima et al., 2000). 
Structural comparison with the enzyme in the absence of cal- 
cium suggests large domain movements during active transport, 
which was confirmed by the structure of SERCAIa in a calcium- 
free state (Toyoshima and Nomura, 2002). The active transport of 
calcium ion by this ATPase pump involves several distinct 
conformation states. These conformational states were visual- 
ized by a number of related crystal structures of SERCA, 
including that bound to ATP or an ATP analog (Olesen et al., 
2007; Olesen et al., 2004; Sorensen et al., 2004; Toyoshima 



and Mizutani, 2004), ADP (Sorensen 
et al., 2004), and a phosphate analog 
^ (Olesen et al., 2007; Toyoshima et al., 

2004). Together, these structures allow 
mechanistic understanding of the com- 
plete cycle of calcium transport driven 
by ATP binding and hydrolysis. 

An ABC transporter contains at least 
four subunits: two transmembrane do- 
mains (TMDs) and two ABCs (or nucleo- 
tide binding domains, NBDs) located in 
the cytoplasm. The first crystal structure 
of the ABC transporter— that of the tetrameric BtuCD complex 
at 3.2 A resolution— revealed a central translocation path that 
is formed between two BtuC subunits within the membrane 
and closed to the cytoplasmic side by a gate region (Figure 6B) 
(Locher et al., 2002). This structure serves as a framework for 
all subsequent structural investigations and mechanistic under- 
standing. Crystal structures, including a bacterial multidrug 
transporter Sav1866 (Dawson and Locher, 2006), a putative 
metal-chelate-type transporter (Pinkett et al., 2007), a maltose 
transporter (Oldham et al., 2007), BtuCD bound to the periplas- 
mic-binding protein BtuF (FIvorup et al., 2007), a bacterial lipid 
flippase MsbA (Ward et al., 2007), a molybdate/tungstate trans- 
porter ModBC (Gerber et al., 2008), and a methionine transporter 
(Kadaba et al., 2008), captured different conformational and 
nucleotide-bound states. These snapshots allow proposition of 
a complete transport cycle for an ABC transporter. The transport 
cycle is elegantly shown for the maltose transporter MalFGK2, 
with a number of conformational states observed at atomic de- 
tails (Chen et al., 2013; Khare et al., 2009; Oldham and Chen, 
201 1 a, 201 1 b; Oldham et al., 2007). 

The energy-coupling factor (EOF) family of membrane trans- 
porters shares a similar organization with the ABC transporters 
with two NBDs and two TMDs. In contrast to the ABC trans- 
porters, the two NBDs of an EOF transporter are encoded by 
two different ATPase genes, and the two transmembrane do- 
mains have contrasting roles, with one specific for substrate 
binding (S component) and the other for energy transduction 
(T component). Structures of the heterotetrameric ECF 
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transporters revealed a striking organization— the sixTMs of the 
S component are placed roughly parallel to the lipid membrane 
and perpendicular to the TMs of the T component (Figure 6C) 
(Wang et al., 2013; Xu et al., 2013). These structures, together 
with results of MD simulation, strongly support the carrier model 
of substrate transport (Widdas, 1952), which differs from the 
alternating access model. 

Intramembrane Proteases. Regulated intramembrane proteol- 
ysis (RIP), which requires cleavage of a membrane-embedded 
substrate protein by an intramembrane protease within the lipid 



Figure 6. Structures and Mechanisms 
of Representative Primary Active Trans- 
porters 

(A) Crystal structure of the calcium ATPase SER- 
CA1a. The structure of SERCAIa is displayed in a 
rainbow-colored cartoon representation with its 
amino-terminus in blue and carboxyl-terminus in 
red (left). Three cytoplasmic domains, A, N, and P, 
are labeled, and the transmembrane domain 
consists often TMs. The calcium binding sites are 
shown in a close-up view (right). Two calcium ions 
are represented in magenta spheres. 

(B) Crystal structure of the ABC transporter 
BtuCD. BtuCD consists of four subunits, two 
membrane-spanning subunits BtuC (purple and 
blue), and two ABC subunits BtuD (cyan and 
green). The transport path is closed to the cyto- 
plasm by a gate region, with two residues Thr142 
and Ser143 playing a key role. 

(C) Crystal structure of a representative ECF 
transporter. The structure of a tetrameric ECF 
transporter is shown in two perpendicular views. 
The TMs of the substrate-binding S component 
(EcfS) are nearly parallel to the lipid membrane and 
perpendicular to those of the energy-transducing 
T component (EcfT). The structural organization 
revives the carrier model for membrane trans- 
porters. 



bilayer, is a universally conserved 
signaling mechanism from bacteria to hu- 
man beings (Brown et al., 2000). The 
most remarkable examples of RIP include 
cleavage of the cholesterol-controlling 
transcription factor SREBP by the metal- 
loprotease S2P, the cleavage of the 
growth factor Spitz by the serine protease 
rhomboid, and the cleavage of amyloid 
precursor protein by the aspartate pro- 
tease presenilin (which is the catalytic 
component of the ysecretase). Unlike 
proteases in aqueous environment, the 
hydrophilic active site of an intramem- 
brane protease must be accessible to 
water molecules, which are required for 
proteolysis, and substrate proteins within 
the hydrophobic lipid membrane. Struc- 
tural information is now available on the 
bacterial homologs of the three classes 
of intramembrane proteases, including 
the rhomboid homolog GIpG from E. coll 
(Ben-Shem et al., 2007; Lemieux et al., 
2007; Wang et al., 2006; Wu et al., 2006), the archael S2P homo- 
log (Feng et al., 2007), and the presenilin homolog PSFI (Li et al., 
2013). In all cases, the active site has constant access to water 
molecules through a funnel that opens either to the extracellular 
side or the cytoplasmic side. This crystallographic finding is 
elegantly simple yet completely unanticipated in the absence 
of structural information. The active site is shielded from the hy- 
drophobic membrane bilayer by one or two gating TMs, which 
undergo lateral rotation or movement to allow substrate entry. 
For GIpG, the gating helix TM5 had been captured in fully 
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open, partially open, and closed conformations. The recent cryo- 
EM structure of human y-secretase (Lu et al., 2014), along with 
the X-ray structure of its putative substrate-recruiting compo- 
nent nicastrin (Xie et al., 2014), revealed tantalizing clues about 
functional mechanisms of y-secretase. 

Method Development 

Protein Expression and Purification 

Prior to 1970s, all proteins employed for X-ray crystallography 
were derived from endogenous sources— mostly animal tissues 
and organs— and were biochemically purified. These proteins, 
exemplified by myoglobin and lysozyme, are thermodynamically 
stable and contain few flexible surface sequences that may 
hinder crystallization. In late 1970s, discovery of the restriction 
enzymes greatly facilitated cloning of target cDNA into plasmids, 
allowing recombinant expression in E. coli. Unfortunately, many 
recombinant proteins, especially those containing multiple do- 
mains and/or flexible surface sequences, defied crystallization. 
Finding a crystallizable protein domain or fragment frequently 
required multiple trials of protein engineering, each involving 
different boundaries or mutations for the target protein. Limited 
proteolysis has been used to identify stable core domains that 
are amenable for crystallization. One of the most remarkable ex- 
amples of limited proteolysis was the identification of sequence 
boundaries for the DNA-binding domain of p53 (Pavletich et al., 
1993). 

Recombinant coexpression of two or more proteins became a 
popular strategy in the 1990s. This strategy allowed convenient 
assembly of multicomponent protein complex. Although bacte- 
rial expression is fast and inexpensive, eukaryotic proteins are 
frequently misfolded or aggregated when expressed in bacteria, 
likely due to differences in folding environment. Eukaryotic re- 
combinant expression systems, involving yeast, baculovirus-in- 
fected insect cells, and mammalian cells, play an increasingly 
more important role (Assenberg et al., 2013). For mammalian 
cell expression, both stable cell lines and transient expression 
through plasmids have succeeded in production of large quanti- 
ties of materials for crystallization. Stable CHO cells allowed 
expression and purification of milligram quantity of y-secretase, 
a four-component membrane protein complex (Alattia et al., 
2013). 

Crystaiiization 

Until early 1990s, approximately half of all crystals were ge- 
nerated with ammonium sulfate as the major precipitant in the 
crystallization buffer. In the year 2013, however, none of the 55 
crystal structures reported in the journal Nature relied on ammo- 
nium sulfate as a crystallization agent. This dramatic change 
likely reflects the proposition that ammonium sulfate may work 
better for small protein of greater thermal stability. Perhaps 
more importantly, the reagents and tools for crystallization 
have steadily improved in the last two decades, including 
numerous commercial reagent kits, robotic arms, and develop- 
ment of special methodology for membrane protein crystalliza- 
tion (McPherson and Gavira, 2014). The lipidic cubic phase 
(LCP) has been widely used for the crystallization of membrane 
proteins (Caffrey, 2009), with fantastic successes for GPCRs 
(Cherezov et al., 2010). Bicelles were also applied to membrane 
protein crystallization (Agah and Faham, 2012). 



Although prior experience may facilitate crystallization of a 
new target protein or macromolecular complex, systematic 
screening of representative crystallization space is almost al- 
ways recommended. In all cases of difficult proteins, biochem- 
istry and molecular biology are the keys for crystallization — 
that is, improvement of solution behavior through biochemical 
characterization and extensive protein engineering. Insertion of 
a T4 lysozyme into p2AR proved to be important for improve- 
ment of crystal packing and hence X-ray diffraction limit (Rose- 
nbaum et al., 2007). Sequence alignment among homologous 
proteins frequently yields valuable hints on how to improve the 
chance of crystallization. For example, replacing five noncon- 
served amino acids with conserved ones led to crystallization 
of a presenilin homolog (Li et al., 2013). The crystallization of 
the AMPA-type receptor GluA2 required carboxyl-terminal and 
internal deletion, removal of potential glycolysation sites, alanine 
substitution, and specific mutation (Sobolevsky et al., 2009). 
Structure Determination 

Since the invention of the oscillation/rotation protocol (Arndt and 
Wonacott, 1977), it has quickly become the primary X-ray dif- 
fraction data collection method on single crystals. Powerful 
rotating anode generators have been improved continuously to 
generate X-rays with high intensity. Synchrotron radiation, which 
supplies X-rays with tunable wavelength and much higher X-ray 
flux than home source, was brought to the attention of biological 
crystallographers (Rosenbaum and Holmes, 1971) and quickly 
became an intimate partner of structural biology. Fast-readout 
large-area X-ray detectors have been critically important for 
data collection. Synchrotron beamlines are provided with 
charge-coupled detectors (CCDs), which make data collection 
much faster and more accurate than ever before (Phillips et al., 

1993) . More recently, the concept of shutter-free data collection 
has been made possible by the pixel array detectors (Pilatus 6M) 
(Kraft et al., 2009). Cryogenic protection by flash freezing crys- 
tals at ~100 K greatly reduced radiation damage (Rodgers, 

1994) , which allows complete data collection from a single crys- 
tal, instead of data merging from multiple crystals in the past. 

Software for X-ray data collection and processing, structure 
determination, and model building has seen major improvements 
in the last two decades. Most notably, structure determination 
based on a single crystal became possible with development of 
the multiwavelength anomalous diffraction (MAD) method (Phillips 
and Hodgson, 1980; Yang et al., 1990). The ease of selenome- 
thione incorporation into recombinant proteins and the powerful 
synchrotron radiation have made MAD the preferred method for 
de novo protein structure determination. Ribonuclease H was 
the first protein whose structure was successfully determined 
by selenium-based MAD analysis (Yang et al., 1990). Free R fac- 
tor, introduced as an objective criteria for structural cross-valida- 
tion (Brunger, 1992), prevents overrefinement and has quickly 
become a key parameter for all X-ray structures. Free R factor 
was also instrumental to implementation of maximum likelihood 
target functions in crystallographic programs (Pannu and Read, 
1996). In 1994, the CCP4 (collaborative computational project, 
number 4) suite was established, which collected a number of 
programs for various macromolecular structure determination 
methods by X-ray crystallography (Collaborative, 1994). Another 
structure determination package CNS (Brunger et al., 1998), 
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superseded by PHENIX (Adams et al., 2002), not only provides all 
necessary programs for X-ray structure solution but also incorpo- 
rates the refinement method of simulated-annealing after X-plor 
(Brunger, 1990). Structural genomics demanded high-throughput 
crystallography, with a number of automated structure solution 
pipelines established worldwide through integration of existing 
programs for automation. 

The use of MIR demands generation of heavy atom derivatives 
of the crystals. The heavy atom agents “magic seven” (Boggon 
and Shapiro, 2000) and “membrane’s eleven” (Morth et al., 
2006) were recommended for crystals of soluble and membrane 
proteins, respectively. Polynuclear metal clusters were utilized 
for phasing of larger protein complexes (Dauter, 2005), exempli- 
fied by the ribosomal large subunit (Ban et al., 2000). Compared 
to heavy metals, bromide or iodine ions can be quickly soaked 
into protein crystals within minutes. In addition to MIR and 
MAD, mainstream experimental phasing methods also include 
single-wavelength anomalous dispersion (SAD), single isomor- 
phous replacement (SIR), multiple isomorphous replacement 
with anomalous scattering (MIRAS), and single isomorphous 
replacement with anomalous scattering (SIRAS). Molecular 
replacement (MR) is used for phasing with a homologous model. 
Automated model-building algorithms and molecular modeling 
graphics such as COOT (Emsiey and Cowtan, 2004), preceded 
by O (Jones et al., 1991), have greatly accelerated the pace of 
structure determination. 

Prior to the mid-1 990s, it wasn’t uncommon for a PhD student 
in an X-ray crystallography laboratory to spend 3 to 4 years to 
solve the phase problem, requiring understanding of Patterson 
maps and resolution of the Marker sections. Consequently, stu- 
dents then usually needed to have certain knowledge of mathe- 
matics and physics. Nowadays, a student only needs to mount 
crystals correctly in the X-ray beam, and the rest— data collec- 
tion and structure determination— can be just a few clicks 
away. Intense X-rays at various synchrotron facilities have not 
only greatly expanded our investigative power on small and 
weakly diffracting crystals but have also made high-throughput 
crystallography a reality (Joachimiak, 2009). Remote control of 
data collection at synchrotrons has eased the trouble of long- 
distance traveling and sleepless night fatigue. 

Hard X-Ray Free Electron Laser 

For most X-ray beamlines in existing synchrotrons, destruction 
of crystals by radiation damage accompanies the process of 
data collection. Ultrashort (femtoseconds), high-intensity X-ray 
pulses from free-electron lasers were predicted to provide useful 
structural information before the destruction of the sample by ra- 
diation damage (Neutze et al., 2000). This prediction was 
reached on the basis of computer simulation of radiation dam- 
age, taking into account photon energy, pulse length and inten- 
sity, and sample size. A decade later, the proof of concept was 
established with image construction on diffractions collected 
on individual virus particles (Seibert et al., 2011) and an electron 
density map at 8.5 A resolution of photosystem I, which required 
millions of nanocrystals and more than 3,000,000 diffraction pat- 
terns (Chapman et al., 2011). Both studies were carried out at the 
Linac Coherent Light Source at Stanford University, which offers 
the world’s first hard X-ray free electron laser (XEFL) with an 
X-ray intensity of ~10^^ photons in 10 femtoseconds, or about 



10-13 orders of magnitude more intense than a regular third- 
generation synchrotron. The first high-resolution X-ray structure 
by serial femtosecond crystallography (SFX) was accomplished 
on lysozyme (Boutet et al., 201 2), followed by de novo generation 
of experimental phases for lysozyme (Barends et al., 2014). This 
technology has been successfully applied to structure deter- 
mination of a precursor cathepsin B (Redecke et al., 2013), 
photosystem II (Kern et al., 2013), and a human serotonin recep- 
tor (Liu et al., 2013). 

The breathtaking advances of XFEL-based SFX rely on devel- 
opment of container-free sample spraying techniques, with initial 
crystal hit rate of less than 0.01 %. Thus, a vast number of nano- 
crystals were a prerequisite for such technology, although 
improvement is underway. By design of the SFX technology, all 
X-ray diffraction patterns collected are still, partial images, which 
demand improvements in data processing software (Hattne 
et al., 2014). The fact of one image per crystal also mandates 
the isomorphous nature of the microcrystals. These challenges, 
together with the exceedingly high costs associated with con- 
struction of XFEL facility, may qualify the near-future application 
of SFX to limited pragmatic impact. Nonetheless, as the ultimate 
technology for determination of macromolecular structure in 
aqueous solution at room temperature, XFEL is destined to domi- 
nate in the future. 

Molecular Dynamics 

Crystal structures represent static snapshots of macromole- 
cules— averaged temporally and spatially over myriad of subtle 
conformational differences. In real life, however, these mole- 
cules may interconvert among a few distinct conformations to 
accomplish their biological functions. Thankfully, these distinct 
conformational states can be trapped by alteration of crystalliza- 
tion condition, inclusion of specific ligand, and/or mutation of 
specific amino acids. At times of difficulty in capturing one or 
more of these conformational states, molecular dynamics (MD) 
simulation came to the rescue. A super machine named Anton 
extended the timescale of MD simulation to millisecond and 
therefore enabled the observation of large-scale conformational 
changes of macromolecules in silico (Lindorff-Larsen et al., 
201 1 ; Shaw et al., 201 0). Using this approach, the transition be- 
tween different conformational states was observed for a num- 
ber of proteins. 

In addition to identifying putative conformational changes, MD 
simulation also helps reveal insights that are sometimes unnoted 
by structural analysis. Simulation of the c-Src and Hck kinases 
revealed an important role for the linker sequence between 
SH3 and SH2 domains, which helps maintain the kinases in an 
inhibited state (Young et al., 2001). 12 ns MD simulations of the 
aquaglyceroporin GIpF identified the spatial and temporal prob- 
ability distribution of a single file of seven to nine water molecules 
and defined their orientation inside the channel, which mecha- 
nistically explained the impermeability of aquaporin to protons 
(Tajkhorshid et al., 2002). The molecular mechanism of ion selec- 
tivity, versus Na"^, in potassium channels was rationalized by 
molecular simulation and theoretical computation (Bostick and 
Brooks, 2007; Noskov et al., 2004). 

Time-Resolved X-Ray Crystallography 

Conformational dynamics of macromolecules, especially those 

in enzyme reactions, can be captured by time-resolved X-ray 
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crystallography, which uses intense, polychromatic X-ray pulses 
to generate Laue diffraction from a single crystal upon induction 
of light-initiated reaction (Neutze and Moffat, 2012). A classic 
example is observation of photodissociation and rebinding of 
carbon monoxide to myoglobin and hemoglobin (Knapp et al., 
2006; Srajer et al., 1996). These time-resolved findings identify 
myoglobin photoproducts and serve a structural basis to spec- 
troscopic observations and MD calculations. Technical chal- 
lenges specific for Laue diffraction such as overlap of energies 
and spots have been effectively dealt with. Maturation of time- 
resolved X-ray crystallography allowed its application in fragile 
membrane protein crystals, as evidenced by the direct observa- 
tion of light-induced changes in a photosynthetic reaction center 
(Burgess et al., 2003). The emergence of XFELs is transforming 
the field of time-resolved X-ray crystallography, despite nu- 
merous unresolved challenges. In contrast to time-resolved 
Laue diffraction, which requires crystals, time-resolved wide- 
angle X-ray scattering (WAXS) allows study of macromolecules 
in solution and has been successfully applied to a number of pro- 
teins, including myoglobin and hemoglobin. 

Concluding Remark 

X-ray crystallography seemed to be at its pinnacle in the early 
1990s, when I, as a motivated graduate student, felt extremely 
excited by the rapid emergence of crystal structures on biologi- 
cally important macromolecules. The forecast was that structural 
biology by X-ray crystallography would become saturated within 
a few years, leaving structural biologists in an awkward situation 
of excess in supply. Another prediction, believed by many in the 
late 1990s, would have structural genomics— automated high- 
throughput X-ray crystallography— dominate the research land- 
scape. Thankfully, neither prediction was remotely close to fact. 
X-ray crystallography is alive and well and continues to play an 
extremely important role in deciphering the secrets of life. In 
essence, life is all about function and structure, with the former 
determined by the latter. In the past five decades, structural 
biology through X-ray crystallography has provided important 
mechanistic insights into every discipline of modern experimental 
biology. A large proportion of all display items in the seventh edi- 
tion of the Jeremy Berg and Lubert Stryer B/oc/iem/sf/y textbook 
are structural images, mostly derived from X-ray crystallographic 
analyses. X-ray crystallography is at its pinnacle. What follows 
pinnacle must be a downhill path— would this be true for X-ray 
crystallography? 

In response to a question raised by a motivated, biophysics- 
oriented student, Doug Rees of Caltech commented at a special 
seminar at the end of 2013 that “If I were to choose a lab to join 
for my PhD thesis research, I would probably go for an EM lab.” 
This statement echoes the emotions of those structural biolo- 
gists who have been so comfortable with X-ray crystallography 
as the method of choice for elucidating three-dimensional struc- 
tures of macromolecules. The recent technological advances in 
cryo-EM, single-particle analysis have sent shockwaves through 
the entire structural biology community. Near-atomic resolution 
EM structures have been obtained not only for the megacom- 
plexes such as ribosomes and viruses (Amunts et al., 2014; 
Zhang et al., 2010) but also for relatively small protein particles 
exemplified by the TRP channel (Liao et al., 2013). Recently, a 



4.5 A resolution density map was generated for human y-secre- 
tase (Lu et al., 2014), which has a calculated protein molecular 
weight of only 160 kDa with no symmetry. In the 1990s, the 
Cold Spring Harbor Laboratory Course on X-ray Crystallography 
was extremely popular for NMR spectroscopists. That signaled a 
trend that is still unfolding today. In the 2014 Kuo KH Summer 
School of Electron Microscopy in Shanghai, a number of estab- 
lished X-ray crystallographers were enrolled as students. There 
is no doubt that a new trend is in place and will change the 
pace of structural biology for many years to come. 

John Burdon Sanderson Haldane, one of the greatest evolu- 
tionary biologists of all times, once declared in his essay. The 
Future of Biology, that "... the future will make any detailed pre- 
dictions look rather silly.” Indeed, the eternal truth in research is 
that the unexpected always happens. Within the past 2 years, 
cryo-EM has emerged as a competitive and perhaps even 
more favored tool for elucidation of macromolecular assemblies 
with molecular weight of more than 300 kDa. In the foreseeable 
future, advances in sample preparation and image acquisition 
will likely expand the advantages of cryo-EM over X-ray crystal- 
lography into complexes with smaller molecular weights. On the 
other hand, technological development of XFEL may strongly 
affect the comparison of EM versus X-ray. After all, the ultimate 
goal of XFEL is to make reconstruction of single-molecule scat- 
tering possible, namely to visualize single molecules with atomic 
details in solution. Regardless of these scenarios. X-ray crystal- 
lography will continue to dominate structure determination for 
many years to come, owing to its mature methodology, high res- 
olution, convenient accessibility worldwide, and a vast number 
of experienced users. 
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SUMMARY 

Genomic DNA replicates in a choreographed tempo- 
ral order that impacts the distribution of mutations 
along the genome. We show here that DNA replica- 
tion timing is shaped by genetic polymorphisms 
that act in c/s upon megabase-scale DNA segments. 
In genome sequences from proliferating cells, read 
depth along chromosomes reflected DNA replication 
activity in those cells. We used this relationship to 
analyze variation in replication timing among 1 61 indi- 
viduals sequenced by the 1000 Genomes Project. 
Genome-wide association of replication timing with 
genetic variation identified 16 loci at which inherited 
alleles associate with replication timing. We call these 
“replication timing quantitative trait loci” (rtQTLs). 
rtQTLs involved the differential use of replication ori- 
gins, exhibited allele-specific effects on replication 
timing, and associated with gene expression varia- 
tion at megabase scales. Our results show replication 
timing to be shaped by genetic polymorphism and 
identify a means by which inherited polymorphism 
regulates the mutability of nearby sequences. 

INTRODUCTION 

Replication of eukaryotic genomes follows a strict temporal pro- 
gram with each chromosome containing segments of character- 
istic early and late replication. This program is mediated by the 
locations and activation timing of replication origins along each 
chromosome (Rhind and Gilbert, 2013). Expressed genes tend 
to reside in early-replicating region of the genome (Rhind and 
Gilbert, 2013). Compared to early phases of replication, late 
phases of replication are faster, less structured (Keren and 
McCarroll, 2014), and more mutation-prone; late-replicating 
loci have elevated mutation rates in the human germline (Stama- 
toyannopoulos et al., 2009), in somatic cells (Keren et al., 2012), 

CrossMark 



and in cancer cells (Lawrence et al., 2013). Structural mutations 
and chromosome fragility are also more common in late-repli- 
cating genomic regions (Keren et al., 2012; Letessier et al., 
2011). At the other extreme, chromosome fragility (and conse- 
quent mutations) are also increased at specific “early replicating 
fragile sites” (ERFSs), a subset of early replication origins at 
which interference between replication and transcription leads 
to double strand breaks (Barlow et al., 2013; Pedersen and De, 
2013; Drier et al., 2013). These aspects of genome replication 
are conserved all the way to prokaryotes, in which genes close 
to the replication origin have increased expression relative to 
genes close to the terminus (Slager et al., 2014; Rocha, 2008), 
essential genes tend to be co-oriented with the direction of repli- 
cation fork progression (Rocha, 2008), and the rate of mutation 
gradually increases with distance from the origin (Sharp et al., 
1989), although close proximity to the origin can lead to struc- 
tural alterations under conditions of replication stress (Slager 
et al., 2014). 

A genome’s elaborate program of DNA replication is therefore 
strongly connected to genome function and evolution and could, 
in principle, be an object of variation and selection itself. How- 
ever, it is not known whether DNA replication timing varies 
among members of the same species, nor whether such varia- 
tion is under genetic control. Previous studies have concluded 
that replication timing is globally similar among individuals of 
the same species (Ryba et al., 2010, 2012; Hiratani et al., 2008; 
Pope et al., 201 1 ; Mukhopadhyay et al., 2014). We hypothesized 
that this global similarity could still in principle coexist with inter- 
individual variation at many individual loci and that such variation 
might be used to find genetic influences on replication timing. 

RESULTS 

DNA Replication Timing Varies among Humans 

DNA replication results in dynamic changes in the copy number 
of each genomic locus; the earlier a locus replicates, the greater 
its average copy number in replicating (S phase) cells. To profile 
these differences across the genome, we have previously iso- 
lated G1 and S phase cells using FACS (Figure 1A), sequenced 
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Figure 1 . DNA Replication Timing Varies among Individuals at Specific Loci 

(A) FACS-sorting cells by DNA content enables analysis of DNA copy number (by whole-genome sequencing) in G1 and S phase cells (adapted from Keren et al., 
2012 ). 

(B) Analysis of the ratio of DNA copy number between S and G1 phase cells along each chromosome allows the construction of replication timing profiles; early- 
replicating loci have a higher average copy number in S phase cells relative to late-replicating loci. Cells from different individuals show consistent replication 
timing programs across most of their genomes. In this and all subsequent figures, replication timing (and read depth) data are normalized to have a genome-wide 
mean of zero and SD of one; the y scale thus represents Z score units. 

(C) A genomic locus (gray shading) exhibits interindividual variation in DNA replication timing, with only three of the six individuals exhibiting a replication origin 
peak structure at this locus. Black lines: smoothed replication profiles. 

(D) An overlay of replication profiles from two individuals reveals a locus with variation in origin activity. 

(E) The local distribution of replication timing measurements across many adjacent data windows allows statistical detection of replication timing variants. The 
example depicts the distributions in the genomic region shown in (D). 

(F and G) Replication variants in which a replication origin (or origin cluster) is active in some individuals but inactive in others, as inferred from the presence or 
absence of a peak in the replication profiles. 

(H and I) Replication variants in which the average utilization or activation time of a replication origin varies among individuals, as inferred from differences in peak 
height. 



the DNA from both cell-cycle phases, and inferred replication 
timing from the long-range fluctuations in relative sequence 
abundance (the ratio of sequencing read depths from S- and 
G1 -phase cells) along each chromosome (Figure 1B) (Koren 
et al., 2012). To facilitate interpretation and comparison of repli- 
cation profiles, we normalize replication timing to units of SDs 
(Z score units, with a genome-wide average of zero and SD of 
one). Replication profiles provide information regarding the 
time of replication of each locus in the genome. They also pro- 
vide the estimated locations of replication origins, which are 
inferred from peaks along the replication profiles, where replica- 
tion is earlier than the replication of flanking sequences (Ra- 
ghuraman et al., 2001; Hawkins et al., 2013); in mammalian 
genomes, replication peaks correspond to either single origins 
or clusters of closely-spaced replication origins. In previous an- 
alyses of replication timing in lymphoblastoid cell lines from six 
individuals, we compared the individual-averaged profiles to 
patterns of mutations and variation in the human genome (Koren 



et al., 2012) and compared the replication profiles of female 
active and inactive X chromosomes (Koren and McCarroll, 201 4). 

We sought in the current work to better ascertain and under- 
stand interindividual variation in DNA replication. The replication 
profiles of the six individuals closely matched one another 
across most of the genome (correlation coefficients r = 0.91- 
0.97 among all comparisons; Figures IB and 1C), consistent 
with earlier findings that profiles from different individuals are 
broadly similar at genomic scales (Ryba et al., 201 0, 201 2; Hira- 
tani et al., 2008; Pope et al., 2011; Mukhopadhyay et al., 2014). 
However, at scales of several hundred kilobases, we found 
that specific genomic loci exhibited clear differences in replica- 
tion timing among the six individuals (Figure 1C). A systematic 
search for replication variation identified 221 replication-variant 
loci (Figures 1 D-1 1; Extended Experimental Procedures available 
online), each of which spanned 0.2-1 .4 Mb (mean = 0.43 Mb). At 
most variant loci, individuals differed qualitatively in the usage 
of an origin (or origin cluster) (Figures 1C, ID, IF, and 1G), as 



1016 Cell 159, 1015-1026, November 20, 2014 ©2014 Elsevier Inc. 




Cell 



inferred from the presence of a peak in the replication profile; or 
quantitatively in the average utilization or activation time of a 
replication origin (Figures 1H and 11), as inferred from variations 
in the height of a peak. 

This analysis indicated substantial replication timing variation 
among humans but did not establish whether any aspect of 
this variation is under genetic control. Importantly, other factors 
could in principle influence the observed inter-individual varia- 
tions in replication timing, including epigenetic influences or 
even the growth states of the cells at the time of DNA harvesting 
and the transformation of the cells with EBV. To identify those 
replication phenotypes that consistently associate with specific 
alleles, genetic mapping requires analysis of DNA replication 
timing in far more individuals. However, studies of replication 
timing to date have involved small numbers of samples. 

DNA Replication Activity Is Visible in Whole-Genome 
Sequence Data 

Whole genome sequencing is increasingly used to study DNA 
sequence variation in large numbers of humans; some studies, 
such as the 1 000 Genomes Project, use DNA samples extracted 
from cultured, proliferating cells (1000 Genome Project Con- 
sortium, 2012). We hypothesized that active DNA replication 
might be visible in such data: the presence of S phase cells in 
such cultures could in principle cause long-range fluctuations 
in DNA copy number (measured by read depth) along each chro- 
mosome, with early replicating loci contributing more DNA than 
late-replicating loci. 

Array- and sequencing-based profiles of DNA copy number 
have long been known to contain megabase-scale “wave” pat- 
terns of copy number fluctuations that correlate with large-scale 
patterns of GC content along mammalian chromosomes (Mari- 
oni et al., 2007; Diskin et al., 2008; van de Wiel et al., 2009; Lep- 
retre et al., 2010; van Heesch et al., 2013; Aird et al., 2011). The 
sources of these GC-wave effects have been assumed to be 
technical. However, although GC content influences the effi- 
ciency of PCR amplification, GC-wave effects are present at 
megabase rather than subkilobase (amplicon) scales. Notably, 
GC content and DNA replication timing are highly correlated at 
megabase scales (Rhind and Gilbert, 201 3), and DNA copy num- 
ber is typically measured in cell populations that are derived from 
asynchronous, proliferating cell cultures that contain many cells 
in S phase. In fact, a recent study (contemporaneous with this 
work) noted a visual resemblance and statistical correlation be- 
tween a copy number profile (derived by array CGH) and DNA 
replication timing profiles (Manukjan et al., 2013). 

We designed a series of tests of the hypothesis that variation in 
sequencing coverage along chromosomes arises from true het- 
erogeneity in DNA copy number due to ongoing DNA replication. 
We first tested this hypothesis using whole-genome sequence 
data from Phase I of the 1000 Genomes Project (1000 Genome 
Project Consortium, 2012), which sequenced DNA from non- 
synchronized, proliferating LCL cultures. For each genome 
analyzed, we calculated read depth along each chromosome 
in sliding windows of 10 kb of uniquely alignable sequence, 
normalized for local GC content at amplicon (400 bp) scale 
(Handsaker et al., 2011) (Extended Experimental Procedures). 
Strikingly, in most LCL-derived genome sequences, fluctuations 



in read depth along each chromosome matched the LCL replica- 
tion timing profiles that we had obtained by directly comparing 
G1 to S phase cells (Figures 2A and 2B), suggesting that they 
reflect true differences in underlying DNA copy number arising 
from replicating cells. The presence of 5%-20% of S phase cells 
within a cell culture was sufficient in order to yield significant sig- 
nals of DNA replication timing (Figure S1E; Extended Experi- 
mental Procedures). 

To further test the hypothesis that active DNA replication 
causes long-range fluctuations in read depth, we utilized the 
fact that a subset of the 1000 Genomes Project samples were 
derived from blood instead of LCLs. Because circulating blood 
cells have generally exited the cell cycle, these samples do not 
contain cells in S phase and should not exhibit signatures of 
DNA replication timing. Consistent with this hypothesis, read 
depth in blood-derived DNA samples lacked the strong autocor- 
relative patterns along chromosomes that we observed in LCL- 
derived DNA and was uncorrelated with profiles of replication 
timing (Figures 2A, 2B, and SI ; Extended Experimental Proce- 
dures). In fact, we could distinguish blood-derived from LCL- 
derived DNA samples with 100% sensitivity and specificity, 
based solely on the relationship of their read depth profiles to 
our independent analyses of LCL replication timing (Figures 2A 
and SI; Table SI). 

Importantly, correlations between read depth and replication 
timing remained strong after controlling for GC content, whereas 
correlations between read depth and GC content (at scales >10 
kb) were negligible after controlling for replication timing (Figures 
2B and SI), suggesting that previously observed correlations of 
read depth with GC content (at 100 kb scales) are due to DNA 
replication timing. Furthermore, in a principal component anal- 
ysis (Patterson et al., 2006) of read depth along each of the 
LCL genomes, the strongest principal component, explaining 
40% of the variance, corresponded to our estimate of the S 
phase replication content of each sample, and the chromosomal 
loadings of this component followed the replication timing profile 
(Figures 2C and 2D; Extended Experimental Procedures). 

The X chromosome provided an additional strong prediction of 
the hypothesis that read depth at a locus reflects the replication 
timing of that locus. We recently found that the inactive X chromo- 
some in females undergoes a spatially unstructured, “random” 
form of replication (Koren and McCarroll, 2014). In light of that 
finding, the hypothesis that long-range fluctuations in read depth 
reflect active DNA replication predicts that inter-individual corre- 
lations in read depth along each chromosome would be weaker 
on the X chromosome in comparisons involving female genomes. 
The 1000 Genomes data abundantly confirmed this prediction 
(Figure 2E). This effect was not observed in blood samples (Fig- 
ure S2) and supports a biological, rather than technical influence 
on read depth, as technical influences do not discriminate be- 
tween sexes or chromosomes. 

A final strong test of the hypothesis that read depth reflects 
ongoing DNA replication was provided by a comparison of 
different cell types: LCLs and embryonic stem cells (ESCs). 
The DNA replication timing profiles of LCLs and ESCs differ 
across 20%-30% of the genome (Ryba et al., 2010; Hansen 
et al., 2010). We sequenced genomic DNA from proliferating 
ESCs and found that read depth in ESCs matched profiles of 
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Figure 2. DNA Replication Activity Is Visible in Sequence Data from the 1000 Genomes Project 

(A) Long-range fluctuations in read depth along chromosomes follow the DNA replication profile in DNA derived from cultured cells but not in DNA derived from 
blood. Shown are smoothed, Z normalized read depth profiles of genomic DNA from four 1 000 Genomes samples derived from LCLs (red) and one DNA sample 
derived from blood (gray), along with the LCL replication timing profile (blue). 

(B) Read depth is correlated with DNA replication timing to varying extents in different samples (as expected from samples with different proportions of cells in S 
phase), but is not correlated with GC content. Shown are partial correlations of (unsmoothed) read depth with replication timing (top) and with GC content 
(bottom), in each case controlling for the other variable (see Figure SI for complete correlations and sample annotations). Each column corresponds to one of 946 
individuals sequenced in the 1000 Genomes Project, sorted by their correlation between read depth and replication timing. Read depth in genomic DNA from 
blood samples did not correlate with replication timing. 

(C) DNA replication timing is the major influence on read depth variation among LCL samples, as determined by principal component analysis. Each circle 
represents one of 882 LCL samples; color indicates the correlation of read depth with replication timing. 

(D) The coefficients (chromosomal loadings) of the first principal component (in D) correspond to the DNA replication timing profile. 

(E) A biological signature of the unstructured, “random” replication of inactive X chromosomes from females (Keren and McCarroll, 2014) is apparent in read 
depth. Inter-individual correlations of read depth along the genome of 1 61 individuals (see text) are reduced on the X chromosome when comparisons involve a 
female sample. 

(F) Sequencing of DNA from embryonic stem cells (ESCs) identifies ESC-specific replication timing profiles. Shown are read depth profiles of ESCs and LCLs 
derived from whole-genome sequencing, along with the corresponding S/G1 replication timing profiles. ESC replication timing data is from Ryba et al. (2010). 

(G) Read depth and replication timing closely track each other within a given cell type (ESC or LCL) and equally distinguish between cell types. Quantitative 
genome-wide comparison of read depth and replication profiles of ESCs and LCLs (two profiles of each are shown). LCL replication timing is from this study 
(profile 1) and Ryba et al. (2010) (profile 2). ESC replication timing data is from Ryba et al. (2010). RD, read depth; RT, replication timing. 

See also Figures SI and S2 and Table SI . 



replication tinning in ESCs and LCLs wherever the profiles were 
similar between cell types, but matched the ESC profiles wher- 
ever ESCs and LCLs had different replication timing profiles (Fig- 
ures 2F and 2G). Within a cell type, read depth and replication 
profiles were virtually indistinguishable, whereas many loci ex- 
hibited consistent differences between the two cell types that 
were visible in both whole-genome sequence and explicit repli- 
cation profiles (Figures 2F and 2G). 

DNA Replication Polymorphisms in a Population Cohort 

The results described above established the existence of repli- 
cation timing variation among humans and demonstrate that 
read depth in whole-genome sequence data from proliferating 



cells reflects active DNA replication. These observations raise 
the intriguing possibility that one could use data from the 1000 
Genomes Project to study variation in replication timing within 
populations and to learn whether it is under genetic control. 

We searched for replication variants in the samples se- 
quenced by the 1000 Genomes Project, focusing on 161 DNA 
samples that appeared to be derived from cultures containing 
the largest fraction (5%-20%) of S phase cells at the time they 
were harvested (based on the correlations of read depth fluctu- 
ations to replication timing; Figure S1E; Table S2; Extended 
Experimental Procedures). We excluded genomic regions with 
evidence of copy number variation (CNV), visible as large-magni- 
tude, stepwise changes in copy number and focused on the 
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Figure 3. Variation in DNA Replication Timing Is Common in the Human Population 

(A) Patterns of read-depth variation among 1000 Genomes individuais indicate the presence of a poiymorphic repiication origin (gray shaded area). This is the 
same repiication variant shown in Figure 1C as variabie in repiication timing in the six individuais. 

(B) Candidate repiication variants identified in the popuiation-based anaiysis of whoie-genome sequence data from the 1000 Genome Project significantiy 
overiap with repiication variants identified from direct S/G1 repiication profiiing of six individuais. Biack arrow: number of overiapping variants; biue bars: number 
of overiapping variants in 10,000 permutations of variant iocations. 

(C) Loci with the greatest variation in read depth among biood-derived DNA sampies from the 1000 Genomes Project did not significantiy overiap with variants 
identified by repiication profiiing. 

(D) Repiication variants coiiectiveiy cover more than 10% of the mappabie human genome. Shown is the iength distribution of genomic regions affected by 
repiication variants. 

(E) Forms of repiication variation. The frequency of each variant type is indicated. 

(F) The size distribution of repiication variants (average repiication timing/read depth differences between the eariy and iate repiication state in each variant). 

(G) Comparison of the repiication timing of the eariy and iate states in each individuai repiication variant iocus. Red iine: repiication difference of 1 SD; biack dots: 
shifts between eariy and eariier repiication; biue dots: shifts between eariy and iate repiication (purpie dots are ioci that shift from under -0.5 to over 0.5, i.e., the 
most significant changes between eariy and iate repiication); green dots: shifts between iate and iater repiication. 

See aiso Figure S4 and Tabies S2 and S3. 



lower-amplitude, continuous fluctuations in copy number that 
reflect active DNA replication (Figure S3; Extended Experimental 
Procedures). 

We identified 361 population variants in the read depth- 
derived DNA replication timing profiles (Extended Experimental 
Procedures). Replication variants identified from the 1000 Ge- 
nomes data significantly (p < 10“^®) overlapped the replication 
variants we had identified by direct replication profiling of six in- 
dividuals (Figures 3A and 3B; as a negative control, loci with the 
strongest read depth variation across the 64 blood-derived DNA 
samples exhibited no significant overlap with the replication var- 
iants in the six individuals; Figure 3C). To obtain a final set of 
replication variants, we combined the variants derived from the 
six individuals with those ascertained from the 1000 Genomes 
Project individuals and reevaluated the differences among all in- 
dividuals specifically in these regions (Extended Experimental 
Procedures). This resulted in a total of 477 variants (Table S3) 
that spanned 610 kb on average and cumulatively spanned 
292 Mb (Figure 3D). 

In over 50% of the variants, individuals differed by the pres- 
ence or absence of a read depth peak, which we interpret as a 
gain or loss of the activity of a replication origin or origin cluster 
(Figure 3E). Approximately 25% of replication variants involved 
quantitative variation in peak height, or the average utilization 



or activation timing of a replication origin. The remainder of the 
variants involved a shift of a replication slope region (transition 
region; Figure 3E), as could arise if a replication initiation zone 
was variable in length. Most replication variants were common, 
with each replication state at each locus shared among multiple 
individuals (Figures 3A, 3E, and S4); this at least partially reflects 
our ascertainment approach and does not preclude the possibil- 
ity of a larger number of rare replication variants that were not 
detected. 

DNA Replication Timing Is Influenced by c/s-Acting 
Genetic Variants 

The availability of replication timing information for 1 61 individuals 
at hundreds of different sites made it possible to search for ge- 
netic influences on replication timing. We treated locus-specific 
replication timing as a quantitative trait (one for each replication 
variant locus) and analyzed the association of replication timing 
to sequence variation in the same individuals, using sequence 
variation data from the 1000 Genomes Project. To reduce the 
burden of multiple hypothesis testing, we also performed a c/s- 
focused association test restricted to genetic variants near each 
replication variant region (Degner et al., 2012; Lappalainen 
et al.,2013; Kilpinen et al.,2013; McVicker et al.,2013; Kasowski 
et al., 2013) (Extended Experimental Procedures). 
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Figure 4. Replication Timing Quantitative Trait Loci 

Genetic variants underiie differences in DNA repiication timing among individuais. Shown are three exampies of repiication variants with significant genetic 
association (additionai exampies are in Figures 5 and S5). 

(A) Variation in repiication timing of a specific iocus is strongiy associated with SNPs that map within the iocus itseif. Shown are Manhattan piots of genome-wide 
association of genetic variants with repiication timing. Red arrow: genomic iocation of the tested repiication variant region. Biack dashed iine: genome-wide 
association significance threshoid. 

(B) Detaiied genetic associations in repiication variant regions (dots; right axis) aiong with repiication (read depth) profiies (ieft axis) for individuais with each of the 
three genotypes of the most strongiy associated SNPs. Yeiiow dots denote rtQTL SNPs that were aiso eQTLs for a nearby gene. 

(C) Left paneis: distribution of read depth for individuais with each of the genotypes of the SNP most strongiy associated with each variant. Right paneis: dropiet 
digitai PGR (ddPCR) anaiysis confirms that the aiieie associated with eariy repiication is aiso overrepresented in genomic DNA from heterozygous individuais, 
consistent with a c/s-acting, aiieie-specific effect on DNA repiication timing. 

See aiso Tabies S4 and S5. 



We identified 20 replication variant loci with significant se- 
quence associations in c/s (nominal p = 1 0“^ to 1 0“^^), of which 
eight were identified in the genome-wide scan and an additional 
twelve in the c/s-localized scan (Figures 4, 5, and S5; Table S4). 
As with other genetic traits studied for association with common 
variants, replication-timing phenotypes tended to associate to 
haplotypes of many variants in linkage disequilibrium (LD) with 
one another. 

To critically evaluate these associations, we used data from an 
additional 334 samples from the 1000 Genomes Project; these 
samples, which had weaker signatures of replication timing (r = 
0. 2-0.4), had not been included in the initial scan. At each of 
the 20 loci, we tested whether the “index SNP” (the most strongly 
associating SNP) from the initial analysis also associated with 
measurements of replication timing in the other samples. Despite 
the lower power to detect replication timing associations using 
these samples, 16 of the associated loci were replicated with p 
values of between 10“^"^ and 0.05, all with the same direction 
of allelic influence as the original samples. The index SNPs at 
the remaining four loci were not significant in the replication anal- 
ysis, reflecting an unknown combination of partial power and 
some false positives in the initial scan (Figure S5). 

We also searched for trans-associations (associations to var- 
iants outside the replication variant region); however, our sample 



set is composed of individuals from many different populations, 
making such an analysis vulnerable to artifacts of population 
structure. Indeed, the 17 identified putative trans-associations 
did not map to genes related to DNA replication or related path- 
ways and were not considered further (see Extended Experi- 
mental Procedures). 

We refer to genetic variants that associate with replication 
timing as replication timing quantitative trait loci (rtQTLs). The 
20 rtQTL haplotypes consisted entirely of SNPs and short indels 
(rather than large structural polymorphisms), indicating that fine- 
scale sequence variation can be sufficient to affect DNA replica- 
tion timing on megabase scales (we note that CNVs and other 
forms of variation could influence replication timing at loci not 
identified here). The implicated genetic variants were almost al- 
ways located in the immediate vicinity of a replication timing 
peak (median distance = 52 kb, p = 7.2 x 10“^) (Figures 4B 
and 5), suggesting that rtQTLs typically affect DNA replication 
by affecting replication origins. The implicated rtQTL haplotypes 
were generally small (2-1 60 kb, median = 20 kb), yet the regions 
whose replication timing associated with these haplotypes were 
4-600 times larger, encompassing 0.39-1 .86 Mb (median = 0.66 
Mb) of surrounding sequence (Figures 4 and 5). 

Individuals heterozygous for rtQTL SNPs had replication 
timing phenotypes intermediate between those of homozygous 
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Figure 5. rtQTLs Involve Variable Use of Replication Origins and Exert Long-Range Effects on Replication Timing 

rtQTLs involve associations with sets of markers in the immediate vicinity of replication origins and affect the replication timing of megabases of surrounding DNA. 
Plots are as in Figure 4B. The lower graphs in each panel (bold black line) show that replication timing differences gradually decrease with distance from rtQTL 
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See also Figure S5 and Table S4. 



individuals (Figures 4B and 4C). This could be due to having one 
earlier- and one later-replicating version of the locus on their 
two chromosomal copies, if rtQTLs are due to allele-specific, 
c/s-acting influences of DNA sequence on replication timing, as 
opposed to trans-acting or nongenetic effects. Individuals het- 
erozygous for rtQTL SNPs should therefore exhibit allelic asyn- 
chrony of replication at the rtQTL loci and have more copies of 
the early-replicating allele than the late-replicating allele in their 
genomic DNA. To test this prediction, we used droplet digital 
PCR (Hindson et al., 2011) to measure the allelic content of the 
genomic DNA at four rtQTL loci each in LCL-derived DNA from 
18-35 heterozygous individuals. At all four loci, the allele associ- 
ated with earlier replication timing (at a population level) also ex- 
hibited greater abundance (p = 0.005 - 5.3 x 10“®) within 
genomic DNA from heterozygous individuals (Figure 4C), while 
control SNPs that were not in LD with the rtQTL SNPs were not 
significantly skewed (Extended Experimental Procedures). These 
results confirm our sequencing- and population-based inference 
and are consistent with a model in which genetic variation affects 
replication timing in an allele-specific, c/s-acting manner. 

DNA Replication Is Associated with a Long-Range Effect 
on Gene Expression Levels 

Replication origin activity is associated with open chromatin 
structure, and DNA replication timing is generally correlated 
with the levels of gene expression across a genome (Rhind 
and Gilbert, 2013). We therefore hypothesized that rtQTLs may 



operate by influencing chromatin states. We compared the loca- 
tions of rtQTLs to the locations of enhancers, defined as DNA 
segments of ~500 bp containing combinations of histone mod- 
ifications that promote expression of nearby genes (Ernst et al., 
2011). We found a significant enrichment of rtQTLs within en- 
hancer regions that were specifically active in LCLs (out of nine 
cell types examined; Table S6); 11 of 20 rtQTL loci contained 
sequence variants within LCL enhancers, even though the latter 
cover <1 % of the genome (enrichment P < 10“^®). This rela- 
tionship suggested that rtQTLs may affect DNA replication by 
promoting an open chromatin structure, prompting us to analyze 
more closely their relationship to gene expression. 

To explore in more detail the relationship between DNA repli- 
cation timing and gene expression levels at regions implicated 
by rtQTLs, we utilized a recent RNA-seq analysis of gene ex- 
pression in 462 LCL samples from the 1000 Genomes Project 
(Lappalainen et al., 2013). We first compared the locations of 
expression quantitative trait loci (eQTLs) identified in the RNA- 
seq study with the locations of rtQTLs. At nine of the 20 rtQTL 
loci, the implicated SNPs overlapped c/s-eQTLs (Figures 4B, 5, 
and S5), even though eQTLs comprised <0.02% of the genome 
(enrichment x^P< 10“^®). Moreover, in eight of those nine cases, 
the rtQTL alleles that associated with early replication were also 
the alleles associated with elevated expression levels. This 
observation provides independent confirmation that our rtQTL 
findings, which were made entirely from genomic DNA (without 
any analysis of RNA), relate to functional aspects of genome 
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Figure 6. Replication Timing Associates with Gene Expression Levels 

Individuals whose genomes exhibit earlier replication at a replication variant locus also tend to exhibit higher average expression of genes across the entire zone 
of replication. 

(A) Correlations between expression levels and replication timing, for the subset of rtQTL loci affecting the replication timing of expressed genes (16 of the 20 
rtQTL loci), across 53 individuals, for each gene within the rtQTL-implicated replication variant regions. Dashed black lines: replication variant region borders; red 
lines: rtQTL association region. 

(B) The correlation between replication timing and gene expression decreases as a function of gene distance from the rtQTL SNPs. 

(C) The distribution of correlations between replication timing and gene expression across individuals, for all replication variants that contained expressed genes. 
See also Figures S6 and S7 and Table S6. 



biology. Furthermore, the tenidency of expressed genes to be in 
early replicating regions of the genome may reflect shared ge- 
netic influences (e.g., influences of genetic variation on open 
chromatin). 

An important distinction between eQTLs and rtQTLs is that 
most eQTLs directly affect the expression of genes in their imme- 
diate vicinity (median distance of 20 kb between SNPs and gene 
promoter, for the eQTLs overlapping rtQTLs), whereas the 
rtQTLs associate with the replication timing of megabases of 
surrounding DNA (median = 660 kb). The order-of-magnitude 
difference in the scale of the effects of rtQTLs and eQTLs pro- 
vided a unique opportunity to address a long-standing ques- 
tion-can DNA replication timing itself influence gene expression 
levels in proliferating cells? We addressed this question by 
testing for elevated expression of genes across the entire, meg- 
abase-scale regions affected by rtQTLs. Focusing on 53 individ- 
uals for which both gene expression and replication timing data 
were available, we compared interindividual variations in replica- 
tion timing to interindividual variation in gene expression levels in 
each of the 20 regions implicated by rtQTLs. At each locus, we 
considered both an aggregate measure of gene expression 
(across all genes in the replication-affected region) (Figure S6) 



as well as the relationship to each individual gene (Figure 6). 
Individuals with earlier replication of a locus strongly tended to 
also have higher expression levels of genes throughout the locus 
(Figure S6), including modest but consistent relationships to 
expression variation for almost every individual gene (Figure 6). 
Strikingly, early replication timing consistently correlated with 
greater gene expression up to distances of ~500 kb, an order 
of magnitude larger than the typical range of eQTLs, or of the 
nine eQTLs that overlapped with rtQTLs (Figure 6). These results 
suggest that replication timing can regulate gene expression 
levels in proliferating cells and that such effects can be exerted 
over long genomic distances. 

The relationships of early replication to elevated levels of gene 
expression across individuals also extended to the remainder of 
the 477 replication timing variants (for which rtQTLs have not 
currently been identified) (Figure 6C), and replication variant sites 
were significantly enriched for eQTLs compared to random 
genomic sites (Figure S7). 

Finally, we note that despite the links between DNA replication 
timing and gene expression, three rtQTL loci were almost 
completely devoid of transcription (Figure 6A). Thus, while repli- 
cation timing and gene expression may share some regulatory 
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Figure 7. An rtQTL at the JAK2 Locus 

A common allele at a SNP downstream of JAK2, previously associated with 
increased JAK2 mutation rates, is also associated with very early replication 
(higher peak) of an adjacent origin in an early replicating fragile site (ERFS) 
region. JAK2 (dashed vertical lines) is transcribed toward the inferred repli- 
cation origin (the peak). The heights of the black points show the level of as- 
sociation of SNPs to the replication timing of this locus, on the scale shown on 
the right. Diagram on the bottom depicts the location and transcriptional 
orientation of JAK2 compared to the direction of replication fork progression 
from the nearby origin. 

influences (such as open chromatin), each process appears to 
be independently controlled. In particular, transcription is not 
required for the establishment of rtQTLs. 

An rtQTL Links Early Origin Activity to JAK2 Mutations 
that Lead to Myeloproliferative Neoplasms 

An intriguing implication of rtQTLs is that inherited alleles could 
modify mutation rates in their genomic vicinity by affecting the 
replication timing of nearby DNA. A medically important example 
of polymorphism-associated mutation rates involves the Janus 
Kinase 2 (JAK2) locus. JAK2 is strongly expressed in blood cells 
including hematopoietic stem cells, B lymphocytes, and LCLs; 
because JAK2 transduces growth signals, activating JAK2 mu- 
tations (e.g., that arise in individual cells cause clonal 

expansions that result in myeloproliferative neoplasms and can 
transform into hematological malignancies. These activating 
JAK2 mutations have been shown to arise more frequently in 
carriers of a “predisposing haplotype” defined by specific alleles 
at genetic markers across JAK2 (Olcaydu et al., 2009; Jones 
et al., 2009; Kilpivaara et al., 2009) and to arise in c/s with respect 
to this haplotype (i.e., on the same chromosomal copy) (Olcaydu 
et al., 2009; Jones et al., 2009; Kilpivaara et al., 2009). The mech- 
anism underlying this relationship is unknown. JAK2 has also 
been identified as an early replicating fragile site (ERFS) in B lym- 
phocytes (Barlow et al., 2013). ERFSs are genomic loci at which 
early origin activation can lead to double strand breaks, particu- 
larly in the presence of nearby transcription, with consequently 
elevated mutation rates at distances of up to hundreds of kilo- 
bases from the break site (Barlow et al., 2013; Pedersen and 
De, 2013; Drier et al., 2013; Jones et al., 2013; Deem et al., 
201 1 ; Wang and Vasquez, 2004). 



We evaluated the possibility that replication timing variation 
could explain the mutability of the JAK2 haplotype and specif- 
ically that the mutation-predisposing haplotype is an rtQTL. We 
found a replication variant near JAK2, which was just below 
the significance threshold of our genome-wide screen for repli- 
cation variants. Replication at JAK2 involved an unusually early 
replicating origin (i.e., a high peak on the replication profile; Fig- 
ure 7), consistent with the identification of the same locus as an 
ERFS (Barlow et al., 2013). We also found that the direction of 
replication fork progression is opposite the direction of JAK2 
transcription (Figure 7), consistent with a model in which chro- 
mosome fragility is enhanced by head-on collisions between 
the replication and transcription machinery. Most importantly, 
the inherited JAK2 alleles that predispose to JAK2 mutations 
all associated strongly (p < 4.5 x 10“"^) with earlier or more effi- 
cient activation of the origin (i.e., a higher replication peak; Fig- 
ure 7) and were among the peak SNPs for the rtQTL (Figure 7). 
Taken together, these data are consistent with a model in which 
chromosome fragility, enhanced by interference between the 
replication and transcription machinery, underlies the mutations 
in JAK2 and does so more frequently in individuals in whom repli- 
cation activity from the origin is earlier and/or more efficient. 

DISCUSSION 

How eukaryotic genomes specify the timing of replication origin 
activation is a long-standing mystery. We show here that locus- 
specific replication timing varies among humans and is influ- 
enced by inherited genetic polymorphism. Replication variants 
involve alterations in the replication timing of large (200 kb-2 
Mb) chromosomal regions. Most if not all of these variants relate 
to differences in replication origin (or origin cluster) activity, as in- 
ferred from replication timing peak structures. We discovered 
SNP haplotypes that associate with DNA replication timing, 
which we call replication timing QTLs (rtQTLs). The genetic vari- 
ation implicated at rtQTLs tends to be at or very close to the in- 
ferred replication origin. Given the overlap between rtQTLs and 
enhancers in the same cell type, rtQTLs may affect DNA replica- 
tion by promoting an open chromatin structure that is permis- 
sible for origin firing. Alternatively, some rtQTLs may alter the 
DNA sequences bound by factors that promote origin firing. 
Understanding the mode of action of rtQTLs will illuminate the 
complex process of replication timing control. 

To study DNA replication timing, we made use of genome 
sequence data from the 1000 Genomes Project, which was de- 
signed primarily as a study of genome sequence variation (and 
not a functional study of DNA replication). As a result, the discov- 
ery power in the current study was limited by the low read depth 
(3-5 X), the relatively small number of individuals analyzed, and 
the lack of any deliberate enrichment for S phase cells. Conse- 
quently, we have likely found only the rtQTLs with the strongest 
effect on replication timing and that arise from common alleles. 
Replication timing is likely shaped by multiple genetic and epige- 
netic factors and will require more powerful analyses to identify 
the full sets of underlying factors at each locus. We expect that 
subsequent work will identify far more rtQTLs in LCLs and other 
cell types. Identification of a larger number of rtQTLs will facilitate 
the analysis of their common features and their molecular mode 
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of action and pave the way for an understanding of the regulation 
of replication origin activity. Furthermore, identification of the 
causal variants that control replication origin activity will make 
it possible to manipulate replication timing experimentally, 
providing new ways of investigating the causes and conse- 
quences of DNA replication timing. 

An intriguing implication of our results relates to the relation- 
ship between DNA replication timing and the generation of 
mutations. DNA replication timing is associated with mutation 
rate variation across the genome in two important ways. First, 
late-replicating DNA is generally more prone to mutation than 
early-replicating DNA. Late replication is also associated with 
increased levels of DNA breakage at common fragile sites 
(CFSs) (Letessier et al., 2011); notably in this regard, the replica- 
tion variants we identified overlap 1 9 CFSs, including FRA3B, the 
most common fragile site in lymphocytes. Second, elevated mu- 
tation rates also occur in regions with high transcriptional activity 
in the vicinity of early replicating origins due to collisions between 
the replication and transcription machineries, which lead to 
chromosome fragility, double strand breaks, single-stranded 
DNA (ssDNA) formation, and error-prone DNA synthesis (Barlow 
et al., 2013; Pedersen and De, 2013; Drier et al., 2013; Jones 
et al., 2013; Deem et al., 2011; Wang and Vasquez, 2004). Ge- 
netic variants that affect DNA replication timing therefore have 
the potential to affect mutation rates in their vicinity. Such an 
effect would have important implications for evolution and for 
disease. First, rtQTL alleles conferring regional late replication 
or early origin activity in the vicinity of active genes could function 
as c/s-acting mutators that cosegregate, via genetic linkage, 
with the mutations they induce, providing a mechanism for 
evolutionary optimization of local mutation rates in sexual spe- 
cies (Martincorena and Luscombe, 2013). Second, rtQTLs may 
serve as common, inherited genetic polymorphisms that affect 
the probability of somatic mutation at specific loci. Diseases 
with high heritability are often assumed to be distinct from dis- 
eases of somatic mutation. Our results suggest, however, that 
inherited polymorphism can consign a genomic region to late 
replication or create early replicating fragile sites in a particular 
tissue, thereby increasing the likelihood that it will acquire so- 
matic mutations in that tissue. At the JAK2 kinase locus, for 
example, the same SNP haplotype is associated with both early 
origin activation and elevated mutation rates that can lead to 
myeloproliferative neoplasms. Altered replication timing in a rele- 
vant cell population could thus be a means by which inherited 
variation influences somatic mutation rates and consequentially, 
disease and cancer susceptibility. 

The presence of a substantial subpopulation of S phase cells 
in expanding cell cultures appears to endow whole-genome se- 
quences derived from such samples with information about 
ongoing DNA replication activity. The influence of DNA replica- 
tion is directly related to the proportion of cells that are in S 
phase, which for cultured cells depends on their growth phase: 
exponentially growing cultures will contain the largest fraction 
of replicating cells, while quiescent cultures will tend to contain 
mostly cells in G1 phase. Replication timing could influence 
any measurement of DNA content (array- or sequencing-based) 
that has been made from proliferating cells (e.g., studies of copy 
number variation and chromatin states). Copy number detection 



in single cells, for example during preimplantation genetic diag- 
nosis (PGD), is also more prone to false CNV detection when a 
cell is in S phase (Dimitriadou et al., 2014). Replication timing 
will need to be carefully considered as a potential confounding 
variable in genomic studies. On the other hand, the sequencing 
of genomic DNA derived from proliferating cells could become a 
routine way of studying replication dynamics. This approach will 
enable the study of DNA replication dynamics in a wide range of 
experimental conditions, cell types, and species, in a technically 
straightforward way. 

EXPERIMENTAL PROCEDURES 

Replication variants were discovered in replication timing data of six individ- 
uals (Koren et al., 2012) by pairwise comparisons of consecutive 200 kb win- 
dows along the genome, selection of windows with a p value <10“^° (t test), 
and consolidation of significant windows within 200 kb of other significant win- 
dows into discrete variant loci. Read depth measurements in 10 kb windows 
from samples from the 1000 Genomes Project (1000 Genome Project Con- 
sortium, 2012; Handsaker et al., 2011) were compared to replication timing 
profiles; for the 1 61 samples with a genome-wide correlation of >0.4 between 
read depth and replication timing, replication variants were identified as above 
and the two lists of replication variants were consolidated into a total of 477 
replication variant loci. At each locus, quantitative measurements of replication 
timing were derived from the 1000 Genomes data across the 161 individuals 
and were correlated with the genotypes (from the 1000 Genomes Project) of 
these same individuals. One thousand permutations of sample genotypes 
were performed in order to obtain an empirical significance threshold for asso- 
ciations with genetic variants. We performed a genome wide association test 
with over 7.5 million genetic variants with an allele frequency >0.05 in the 
tested individuals; as well as a c/s-focused association test with genetic vari- 
ants only within each replication variant locus. rtQTLs were validated using 
droplet digital PGR (Hindson et al., 2011) with allele-specific probes. Expres- 
sion data and eQTLs were from Lappalainen et al. (201 3). The two human em- 
bryonic stem cell lines used for this study, HUES64 and HUES63, were 
donated for research following informed consent under protocols reviewed 
and approved by the Committee on the Use of Human Subjects (IRB) and 
the Embryonic Stem Cell Research Oversight Committee (ESCRO) at Harvard 
University. See Extended Experimental Procedures for further details. 
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SUMMARY 

Endocytic cargo and Rab GTPases are segregated to 
distinct domains of an endosome. These domains 
maintain their identity until they undergo fission to 
traffic cargo. It is not fully understood how segrega- 
tion of cargo or Rab proteins is maintained along the 
continuous endosomal membrane or what machin- 
ery is required for fission. Endosomes form contact 
sites with the endoplasmic reticulum (ER) that are 
maintained during trafficking. Here, we show that 
stable contacts form between the ER and endosome 
at constricted sorting domains, and free diffusion of 
cargo is limited at these positions. We demonstrate 
that the site of constriction and fission for early and 
late endosomes is spatially and temporally linked to 
contact sites with the ER. Lastly, we show that 
altering ER structure and dynamics reduces the effi- 
ciency of endosome fission. Together, these data 
reveal a surprising role for ER contact in defining 
the timing and position of endosome fission. 

INTRODUCTION 

The endocytic pathway is used to internalize components pre- 
sent on the plasma membrane and in the extracellular fluid. After 
internalization, endocytosed cargo is sorted at multiple steps 
during trafficking. Cargo destined for degradation at the lyso- 
some is sorted away from both recycled cargo destined for the 
plasma membrane and cargo trafficked to the Golgi. The sorting 
of these cargoes must occur prior to endosome fission, and 
multiple machineries and mechanisms have been identified 
that contribute to this process (Hanyaloglu and von Zastrow, 
2008; Maxfield and McGraw, 2004; Seaman, 2008). Microtu- 
bules and their motor proteins, branched actin networks gener- 
ated by the Arp2/3 activator WASH, the retromer, and structural 
membrane shaping proteins such as sorting nexins (SNX), 
have all been implicated in endosome structure and cargo 
sorting (Gautreau et al., 2014; Hunt et al., 2013; Puthenveedu 
et al., 2010). However, it is not known what regulates the timing 
and position of membrane fission to separate the sorted 
compartments. 

Functional contact sites have been observed between the 
endoplasmic reticulum (ER) and endosomes (Alpy et al., 2013; 



Eden et al., 201 0; Rocha et al., 2009). Measurements by electron 
microscopy and tomography have revealed that contact sites 
between the ER network and individual endosomes exist at mul- 
tiple discrete positions around the endosome, which additively 
covers only ~5% of the endosome surface area (Alpy et al., 
2013; Friedman et al., 2013). Despite the abundance and 
discrete nature of these contacts, they appear to be tightly 
coupled because the two organelles maintain contact even as 
they are trafficked on the microtubule network (Friedman et al., 
2013; Zajac et al., 2013). Endosomes become bound to the ER 
early in their biogenesis and this association increases with 
maturation: we found >99% of late endosomes are tightly asso- 
ciated with the ER as they traffic, in contrast to ~50% of early 
endosomes (Friedman et al., 2013). Thus, ER contact could 
regulate the biogenesis of endosomes or become targeted to 
endosomes following a maturation step. Once established, ER 
contact with endosomes is often maintained despite trafficking, 
and this suggests important functions occur at the interface. 

Two functions have been demonstrated to occur at the ER-en- 
dosome interface (van der Kant and Neefjes, 2014). Interactions 
between (VAMP)-associated protein A (VAP-A) on the ER and the 
endosome localized partners are thought to regulate cholesterol 
sensing and lipid transfer. For example, the endosomal protein 
ORP1L interacts with VAP-A under low cholesterol conditions 
that could allow for cholesterol exchange, thereby acting as a 
sensor (Rocha et al., 2009). ER-endosome contact also occurs 
via the ER-localized phosphatase, PTP1B, which interacts with 
epidermal growth factor receptor (EGFR), dephosphorylating it 
to promote incorporation into intraluminal vesicles, a necessary 
step for EGFR degradation by the lysosome (Eden et al., 2010). 
Several recent papers also suggest that late endosomes may 
take up Ca^"^ from ER stores during their maturation process, 
however, it remains to be determined whether Ca^'^ is directly 
transferred at the interface (Kilpatrick et al., 2013; L6pez-San- 
jurjo et al., 2013; Morgan et al., 2013). 

The ER also forms contacts with several other organelles 
(Helle et al., 201 3), and its role at these various sites may be anal- 
ogous. At mitochondria and the plasma membrane, the ER pro- 
vides Ca^"^ in different functional contexts (Elbaz and Schuldiner, 
2011). Emerging evidence also shows lipids are modified or 
transferred at the ER interface (Stefan et al., 2013; Toulmay 
and Prinz, 2011). Recently, we discovered that ER tubules 
circumscribe mitochondrial constrictions and define the position 
of mitochondrial fission (Friedman et al., 2011). We predicted 
that mechanisms of membrane fission may also be conserved 
between various organelles. Here, we hypothesized and tested 
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Figure 1 . Early Endosome Fission Occurs at ER Contact Sites 

(A) Merged images of a live Cos-7 cell expressing mCh-Rab5 (EE in cyan), GFP-Rab4 (EE in red), and BFP-Sec61 p (ER in green) that is pulse-labeled with EGF 
conjugated to Alexa Fluor 647 (cargo in blue). 

(B) Magnified time-lapse images of the region boxed in (A) shows an example of early endosome fission. Merged images show the relative location of Rab4, Rab5, 
ER, and EGF, as indicated over time. The exiting Rab4'^ bud is marked by a yellow arrowhead. See also Movie S1 . 

(C) Trace outline of the endosome shown in (B) and the corresponding line-scan analysis of the relative fluorescence intensity (FI) of Rab4, Rab5, ER and EGF for 
time points: t = 0 s (Pre-ER), t = 1 0 s (Pre-Fission), and t = 1 5 s (Post-Fission). Images in (B) and line-scan in (C) reveal that a dynamic ER tubule (marked by a blue 
arrow) is recruited to the divide between Rab4 compartments just before fission (at t = 10 s). 

(D) Merged images taken of a live Cos-7 cell transfected as in (A) that is instead pulse-labeled with Tf conjugated to Alexa Fluor 647 (cargo in blue). 

(legend continued on next page) 
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whether ER contacts define the timing and the position of endo- 
some fission. To test this, we visualized Rab partitioning, cargo 
sorting, and endosome fission relative to the ER network in live 
Cos-7 cells. 

RESULTS 

Dynamic ER Tubules Make Contact with Early 
Endosome Fission Sites 

To visualize early endosome sorting and fission events relative to 
the ER, cells were cotransfected with GFP-Rab4, mCh-Rab5, 
and with the general ER marker BFP-Sec61 p and were imaged 
by live confocal fluorescence microscopy (Figures 1A and 1D). 
Rab4 localizes to both the vacuolar and budding compartment 
of the early endosome (Sonnichsen et al., 2000), while Rab5 is 
exclusively localized to the vacuolar compartment (Sonnichsen 
et al., 2000; Trischler et al., 1999). Partitioning of cargo also oc- 
curs within the early endosome sorting domains. To ensure that 
the endosomes we visualized were functional, cells were pulse- 
labeled with epidermal growth factor (EGF) (Figures 1A-1C) or 
transferrin (Tf) (Figures 1 D-1 F) cargo conjugated to Alexa Fluor 
647. The Rab4 and Rab5 endosomes contain labeled cargo, 
and their localization and morphology is consistent with previous 
reports (Sonnichsen et al., 2000; Trischler et al., 1999) (Figures 
lAand ID). 

Upon fission, a smaller Rab4‘^ compartment buds off of a 
larger Rab4VRab5‘" vacuolar compartment (Figures 1 B and E, 
top panels; Movies SI and S2 available online). The fluorescent 
EGF is confined to the vacuolar Rab4VRab5'^ compartment and 
does not enter into the bud (Figure 1 B, panel row 3, yellow arrow- 
head; Movie SI). In contrast, Tf was not confined to a compart- 
ment and was found in the Rab4'^ bud as well as the Rab4V 
Rab5^ compartment (Figure IE, panel row 3, yellow arrowhead; 
Movie S2). Live cell imaging (Figures IB and IE) and the corre- 
sponding line-scan analysis (Figures 1C and IF) reveals that in 
both examples, an ER tubule moves into position at the divide 
between the two preformed compartments just before fission 
occurs (compare the position of the ER tubule relative to the 
two sorting compartments in the Pre-ER and Pre-Fission images 
and line-scans). These data reveal that ER tubules mark the po- 
sition of early endosome fission and that there is a temporal rela- 
tionship between ER recruitment and fission. 

ER Tubules Contact Sorting Domains on Early 
Endosomes 

It is difficult to quantitate whether the ER marks all early endo- 
some fission events because early endosomes are small and 
fission is rapid, hindering our ability to capture instances of clear 
constriction followed by fission. Thus, we sought ways to slow 
fission and/or inhibit it at intermediate stages to better resolve 
the relationship between ER tubules and the position of early en- 
dosome constriction and fission. This was accomplished by 
treating cells with dynasore, a drug which has been shown to 



inhibit Dynamin-1, Dynamin-2, and Drpi in vitro (Kirchhausen 
et al., 2008; Macia et al., 2006). As previously reported (Derivery 
et al., 2009; Mesaki et al., 2011), Cos-7 cells treated with dyna- 
sore accumulate tubular early endosomes (Figure 2A). This 
phenotype could be a consequence of dynamin inhibition or it 
could be due to off-target effects, which have been reported 
(Park et al., 201 3). However, our data demonstrate that dynasore 
treatment is a very useful tool to slow the process of endosome 
fission and image bona fide fission intermediates at ER contact 
sites. 

Cells were cotransfected with markers of early endosomes 
(mCh-Rab5) and the ER (GFP-Sec61 P) and were treated with dy- 
nasore just prior to imaging. Treatment led to the formation of 
elongated Rab5^ endosomes and did not alter the structure of 
the ER (Figure 2A) (Derivery et al., 2009; Mesaki et al., 2011). 
Like untreated endosomes, dynasore-induced tubular endo- 
somes contain both degraded and recycled cargo that partition 
as expected (Figures SI A and SI B). To further test if tubular en- 
dosomes had features characteristic of native endosomes, we 
investigated whether they contained domains marked by the 
WASH complex (Figure 2). WASH is a multiprotein complex 
that activates Arp2/3-mediated actin nucleation at sorting do- 
mains (Derivery et al., 2009; Gomez and Billadeau, 2009; 
Harbour et al., 2012; Jia et al., 2010), and this actin nucleation 
may participate in cargo sorting (Puthenveedu et al., 2010), or 
generate force to catalyze fission. To mark the complex, we 
tagged FAM21 , which has been shown to localize to early endo- 
somes, bind the retromer complex, and regulate recruitment of 
WASH to endosomal sorting domains (Seaman et al., 2013). 
Cells were cotransfected with GFP-Rab5, BFP-Sec61p, and 
mCh-FAM21. In dynasore-treated cells, FAM21 localized to 
patches and punctate structures along tubular endosomes (Fig- 
ure 2B). Line-scan analysis demonstrated that FAM21 -labeled 
structures corresponded to positions where Rab5 fluorescence 
was minimal, indicative of a constriction (see arrows in Figure 2C 
and 2D). ER tubules also crossed over the positions of FAM21- 
marked constrictions (see arrows in Figure 2C and 2D). These 
data demonstrate that sorting domains localize to dynasore- 
induced elongated endosomes and ER tubule crossings are 
spatially linked to these domains. 

ER Tubules Contact Endosomes at a Barrier to Cargo 
Diffusion 

Our data suggest that dynasore treatment can stall fission at ER- 
marked endosome sorting domains, and line-scan analysis indi- 
cates endosomes are constricted at ER contact sites (Figure 2D). 
To further establish the importance of the ER at stalled constric- 
tions, we reasoned a functional constriction has the potential to 
impede the diffusion of cargo. According to the bulk flow hypoth- 
esis, membrane-bound cargo that is recycled to the plasma 
membrane, like transferrin receptor (TfR), diffuses freely in the 
membrane (Dunn et al., 1989; Mayor et al., 1993). We designed 
photobleaching techniques to test whether a diffusion barrier 



(E) Zoom from (D) shows the relative localization of Rab4, Rab5, ER, and Tf as indicated over time. An ER tubule (marked by blue arrow) is again recruited to the 
divide between Rab4 compartments just before fission (t = 35 s). An exiting Rab4^/Tr bud is marked by a yellow arrowhead. See also Movie S2. 

(F) Line-scan analysis of relative FI depicts the position of a dynamic ER tubule to the position and timing of endosome fission for time points: t = 0 s (Pre-eR), t = 
35 s (Pre-Fission), and t = 45 s (Post-Fission). Scale bars represent 5 |im in (A) and (D) and 1 |im in (B) and (E). s, seconds. 



Cell 159 , 1 027-1 041 , November 20, 201 4 ©201 4 Elsevier Inc. 1 029 




Cell 




o 

E 




c 

GFP-Rab5 



GFP-Rab5 

mCh-Fam21 



GFP-Rab5 

BFP-Sec61p 



GFP-Rab5 

BFP-Sec6ip 

mCh-Fam21 





BFP-Rab5 GFP-TfR 




Post Bleach 
t=5s 
t=10s 
t=15s 
t=25s 
t=35s 
t=45s 
t=55s 



G 



t=10s t=55s 







Linear Distance (pixels) 



Figure 2. ER Tubules Define Functional Constrictions along Dynasore-Stalled Tubular Endosomes 

(A) Images of live Cos-7 cells expressing mCh-Rab5 and GFP-Sec61 p show that dynasore treatment elongates early endosomes while ER is unaffected. 

(B) Merged images of a live Cos-7 cell expressing GFP-Rab5, BFP-Sec61 p, and mCh-FAM21 following dynasore treatment. 

(C) Magnified image of endosome boxed in (B). Images reveal that FAM21 concentrates at positions along the tubular endosome where Rab5-labeling is reduced 
and ER tubules intersect these FAM21 domains. 

(legend continued on next page) 
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exists at minimunns in fluorescence intensity, which would reflect 
a functional constriction at ER contact sites (see model 
Figure 2E). Cells were cotransfected with an ER marker (mCh- 
Sec61 p), an early endosomal marker (BFP-Rab5), and a mem- 
brane-bound cargo marker (GFP-TfR). Next, cells were treated 
with dynasore to generate elongated endosomes. We photo- 
bleached GFP-TfR at one end of a tubular endosome and then 
visualized fluorescence recovery from the unbleached direction 
(see model Figure 2E). To confirm tubular endosomes persist 
throughout this process, we covisualized unbleached BFP- 
Rab5. Two examples of TfR fluorescence recovery along the 
tubular endosome are shown in kymographs (Figures 2F and 
2H). In the example shown in Figure 2F, the GFP-TfR is photo- 
bleached as indicated (in the yellow circle), and the fluorescent 
signal recovers rapidly (within 10 s) up to a position marked by 
the yellow arrow. This position corresponds to a contact site 
with the tip of an ER tubule (see merge with the ER on the right). 
Furthermore, line-scan analysis demonstrates that the endo- 
some constriction (Rab5) and the diffusion barrier overlap with 
the ER contact site even as the endosome moves over time (Fig- 
ure 2G, yellow arrows mark the diffusion barrier [D.B.]; also see 
Movie S3). In the example shown in Figure 2H, recovery is also 
rapid up to the position marked by a yellow arrow, which corre- 
sponds to a crossing ER tubule. As before, diffusion was limited 
at the position corresponding to a minimum in Rab5 fluores- 
cence on the endosome, confirming that it is indeed a constric- 
tion (Figures 2H and 2I). This suggests that the bulk flow of cargo 
between compartments is restricted at ER-marked constrictions 
prior to endosome fission. 

Fission of Tubular Early Endosomes Occurs at ER 
Contact Sites 

ER association with tubular endosome constrictions is strikingly 
reminiscent of contact sites between ER tubules and mitochon- 
drial constrictions that undergo fission (Friedman et al., 2011). 
We therefore tested if ER-marked constrictions along tubular en- 
dosomes are stalled intermediates primed to undergo fission. 
Cells were cotransfected with mCh-Rab5 and GFP-Sec61p 
then treated with dynasore to generate tubular endosomes. To 
capture fission, dynasore was inactivated with fetal bovine 
serum (FBS) and cells were imaged live (Figure 3A). As with 
native early endosomes, the ER contacted elongated endo- 
somes and defined the position of constriction and fission (Fig- 
ures 3B and 3C). We visualized 31 individual fission events (in 
nine cells. Figure 3D; Movie S4). ER tubule crossings were pre- 
sent at 80.6% of these fission events (Figure 3D). The mean 
coverage of the ER network on the tubular endosomes analyzed 
was 22.36% (Figures 3E and 3F), which demonstrates that ER 



contact at the site of fission is not due to chance. T ogether, these 
data show that ER contact sites are spatially linked to the posi- 
tion of constriction and fission on early sorting endosomes. 

ER Recruitment Is Spatially and Temporally Linked to 
the Position of Late Endosome Fission 

The vast majority of late endosomes (>99%) are also tightly 
associated with the ER over time (Friedman et al., 2013). We 
asked whether ER contact sites are also spatially linked to the 
position of late endosome fission. Cos-7 cells were transfected 
with markers for late endosomes (mCh-Rab7) and the ER 
(GFP-Sec61 P) and were imaged live by confocal fluorescence 
microscopy. Exogenous mCh-Rab7 was expressed at levels 
similar to endogenous (Figure S2A), and the morphology of 
labeled late endosomes matched previous reports (Figures 4A 
and S2B) (Barbero et al., 2002). Cells were also pulse-labeled 
with EGF (conjugated to Alexa Fluor 647) to mark the internaliza- 
tion and segregation of cargo. We imaged the structure, dy- 
namics, and interactions between the endosomes and the ER 
over time at the periphery of the cell where both organelles are 
well resolved. A representative example of a late endosome 
that undergoes fission is shown in Figures 4B and S2C. Here, a 
small Rab7'^ compartment buds off of a larger Rab7^ vacuolar 
compartment. As expected, EGF is retained in the vacuolar 
compartment during constriction and fission (Figures 4B, and 
4C; Movie S5). After merging the time-lapsed images of endo- 
some fission with the location of the tubular ER network, we 
could observe an ER tubule localized to nearly every fission 
site (96.5%: n = 29 from 24 cells. Figure 4D; see also Figure S2D). 
In the majority of these cases (93%), the ER was positioned 
perpendicular to the site of fission. By line scan analysis, it be- 
comes clear that a dynamic ER tubule moves into place to cross 
over and “cup” the bud just before fission occurs (compare Pre- 
ER to Pre-Fission scans) (Figure 4B and 4C). These data reveal 
that ER tubules are both spatially and temporally linked to the 
position of late endosome fission. 

Next, we aimed to generate tubular late endosomes to further 
resolve the position of ER contact relative to the site of fission. 
Consistent with previous reports (Derivery et al., 2009), we could 
generate slightly elongated tubular Rab7-labeled late endo- 
somes by efficiently depleting dynamin-2 (Dnm2) by siRNA treat- 
ment (Figure 4H). Cells were also simultaneously cotransfected 
with mCh-Rab7 and GFP-Sec61p. Prior to imaging, these cells 
were pulse-labeled with EGF (conjugated to Alexa Fluor 647) to 
determine whether elongated endosomes could functionally 
partition cargo. Live confocal fluorescence microscopy revealed 
that elongated endosomes in Dnm2-depleted cells were still able 
to uptake EGF and undergo fission while retaining EGF in the 



(D) Line-scan analysis of relative FI of the endosome shown in (C) confirms that FAM21and ER colocalize with Rab5-labeled endosome constrictions (marked by 
corresponding blue and yellow arrows). 

(E) A model outlines the FRAP technique used to test for a cargo diffusion barrier (D.B.) along tubular endosomes. 

(F) A live Cos-7 cell expressing BFP-Rab5, mCh-Sec61 p, and GFP-TfR (membrane-bound cargo) was treated with dynasore to elongate endosomes. Endosomes 
were photobleached in the region indicated (dotted yellow circle). Images were taken at times indicated during the recovery (see kymograph). See also Movie S3. 

(G) A graph of relative FI during time points shown reveals a D.B. limits the recovery of TfR at the position of an ER-marked constriction (at yellow arrow). The D.B./ 
constriction maintains contact with the ER over time (marked by yellow arrows). 

(H and I) Another example as in (F) and (G). Scale bars represent 5 i^m in (A) and (B) and 1 i^m in (C), (F), and (H). 

See also Figure S1 . 
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Figure 3. Dynasore-Stalled Tubular Early Endosomes Undergo Fission at ER Contact Sites 

(A) The elongated tubular endosome phenotype that results from dynasore treatment can be reversed with the addition of FBS. Time-lapsed images of a live 
Cos-7 cell expressing mCh-Rab5 and GFP-Sec61 p following dynasore treatment (t = 0 min) and after FBS addition at times indicated (right panels). 

(B) Cell treated as in (A). 

(C) Magnified time-lapse images of region boxed in B shows two fission events that occur on a single tubular endosome with each event marked by an ER tubule 
crossing (blue arrows). Yellow arrowheads mark fission products after each division. See also Movie S4. 

(D) The percent of tubular EE fission events marked by ER tubule crossings (out of 31 events from nine cells). 

(E) Method for determining the amount of endosome image surface covered by ER tubule crossing for all 31 fission events. An example is shown of endosome and 
ER tracing from an image immediately preceding fission. Top row shows indicated fluorescence markers and bottom row shows thresholded images. 

(F) Table shows the percentage (22.36%) of endosomal pixels colocalized with pixels from ER tubules crossing the endosome. The right panel shows that the 
measured frequency of ER-marked fission (80.6%) is significantly higher than that due to chance (22.36%). ***p < 0.001 , Fisher’s exact test. Scale bars represent 
5 |am in (A) and (B) and 1 |am in (C) and (E). min, minutes. 
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vacuolar compartment (Figures 4E and 4F; Movie S6). In every 
example recorded (100%, n = 12 from eight cells), ER tubules 
crossed perpendicular to the site of late endosome (LE) fission 
(Figure 4F). As before, a dynamic ER tubule slides into place 
just prior to the fission event (compare white and blue arrows 
in two left panels). We measured the mean coverage of the ER 
network on tubular endosomes to be 21 .24% for the 12 events 
captured (Figures 41 and 4J). Thus, the average probability of 
ER-marked fission due to chance (21 .24%) is significantly lower 
than the measured frequency (100%) of ER-marked endosome 
fission. 

ER Tubules Contact Cargo-Sorting Domains Prior to 
Fission 

Next, we aimed to mark endosomal sorting domains prior to 
fission and ask if this position also coincided with the position 
of ER recruitment. The retromer complex sorts cargo into a 
domain on the late endosome that will undergo fission and traffic 
to the Golgi (Arighi et al., 2004; Seaman, 2004). FAM21 binds and 
colocalizes with these retromer-mediated sorting domains (De- 
rivery et al., 2009; Gomez and Billadeau, 2009). We thus asked 
whether ER tubules colocalized with FAM21 -marked sorting do- 
mains on late endosomes. Cells were transfected with markers 
of ER (BFP-Sec61 P), late endosomes (GFP-Rab7), and the 
WASH complex (mCh-FAM21). FAM21 effectively partitions to 
both punctate and tubular structures on Rab?"^ late endosomes 
that are indicative of retromer sorting domains (Figure 5A). We 
tracked 165 FAM21 structures discretely localized to budding 
late endosome sorting domains (Figures 5B and 5C; Movie 
S7). Of these, 80% of FAM21 domains colocalize with the site 
of ER tubule contact over time (see example in Figure 5B). 
Thus, most but not all FAM21 -marked retromer domains track 
together with ER tubules, which suggests that the ER is recruited 
after the sorting domain has formed. 

We next observed whether the ER is present at FAM21- 
marked sorting domains when endosomes are undergoing 
fission (Figures 5D-5F; Movie S8). The ER was localized to 
97% of FAM21 -marked fission events (n = 36 from 31 cells. Fig- 
ure 5G). In the example shown, it is important to note that a dy- 
namic ER tubule becomes associated with a FAM21 -marked 
sorting domain just prior to constriction and fission (Figure 5E, 
compare white and blue arrows at 15 and 20 s, respectively; 
Movie S8). The temporal recruitment of ER to the position of 
constriction/fission is also clear by line-scan analysis (Figure 5F, 
see the peak in ER fluorescence at Pre-Fission). Together, these 
data demonstrate that ER contact is not required for retromer 
sorting domain formation or FAM21 recruitment, but that ER 
contact sites are both spatially and temporally linked to the 
fission events at FAM21 -marked sorting domains. 

We have shown that ER tubules make contact with FAM21- 
marked sorting domains prior to fission. We thus wondered if 
ER contact with endosomes requires nucleation of actin by the 
WASH complex. FAM21 binds to the retromer on endosomes; 
it is the first component of the WASH complex to be recruited, 
and does so independently of other WASH complex members 
(Gomez and Billadeau, 2009). We could thus deplete WASH1 
which will disrupt WASH complex assembly and actin nucleation 
(Derivery et al., 2009) and ask whether the ER is still recruited to 



FAM21 -marked sorting domains and if so whether fission still 
occurs. We efficiently depleted WASH1 by siRNA and captured 
25 FAM21 -marked fission events where all but one localized to 
an ER contact site (n = 25 from 21 cells. Figure S3). This demon- 
strates that the ER is recruited to functional endosome fission 
sites independent of WASH complex assembly or WASH-medi- 
ated actin nucleation. 

We have shown several examples where a dynamic ER move- 
ment establishes contact with a sorting domain immediately 
prior to fission. ER tubule dynamics are frequent and occur 
on microtubules (MTs) (Friedman et al., 2010; Waterman- 
Storer and Salmon, 1998). Thus, we probed the effect of MT 
depolymerization on ER-endosome contact. Previously, we 
have shown that endosomes will maintain contact with the ER 
following MT depolymerization (Friedman et al., 2013). Here, 
we tested whether ER contacts would also maintain contact 
with endosome constrictions following MT depolymerization 
(Figure S4). We show several examples where MTs are depoly- 
merized but ER contact still localizes to endosome constriction 
sites. We conclude that MT-dependent ER dynamics may be 
necessary to initiate contact between ER tubules and the 
constriction site. However, once formed, MTs do not appear 
necessary to maintain contact. 

ER Is Recruited to the Endosome Immediately Prior to 
Fission 

We show that tubular sorting domains can be observed prior to 
ER contact and fission. Often ER tubules “attack” the site of the 
preformed sorting domain resulting in fission (Figures 6A and 
6B). In the two complimentary examples shown, fission is 
observed within 10 s of contact formation (Figure 6A and 6B). 
Many mechanisms have been identified to form the budding 
domain and sort cargoes (Figure 6C). Microtubule motors 
have been implicated in sorting and trafficking cargoes (Hunt 
et al., 2013). Sorting nexins can drive the formation of budding 
tubules and stabilize them via curvature sensing BAR domains 
(Cullen, 2008). The retromer complex interacts with the sorting 
nexins and is involved in sorting of cargoes (Pfeffer, 2009; Sea- 
man et al., 2013). FAM21 binds the retromer and is responsible 
for recruitment of remaining WASH complex components (Go- 
mez and Billadeau, 2009). The WASH complex stimulates 
Arp2/3 at endosomal sorting domains thereby activating actin 
polymerization (Derivery et al., 2009). The presence of actin 
patches can also help to segregate select cargoes into the 
bud (Puthenveedu et al., 201 0). The role of these multiple factors 
is to generate sorting domains to traffic cargo. Here, we 
show that following sorting, the ER establishes contact with 
the endosome at sites that are spatially and temporally linked 
to endosome fission. 

Disruption of ER Dynamics and Shape Affects the Rate 
of Endosome Fission 

Our data support the following order of events: the budding sort- 
ing domain forms, ER tubule contact is established at the inter- 
face between the endosome and the bud, and then ~5-1 0 s later 
fission occurs at this contact site (Figures 6A and 6B). Consistent 
with this model, we show several compelling examples where a 
dynamic ER tubule “attacks” the future fission site. Thus, our 
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prediction was that altering the shape and dynamics of the pe- 
ripheral ER tubules would lead to a reduction in the frequency 
of late endosome fission. We chose to disrupt ER shape and dy- 
namics by overexpressing human Reticulon 4a (Rtn4a), which 
has previously been shown to generate highly elongated and un- 
branched ER tubules (Shibata et al., 2008; Voeltz et al., 2006). 
Elongated tubules are coated with immobilized Rtn4a oligomers 
that can exclude other ER luminal and membrane proteins from 
the peripheral ER (Shibata et al., 2008; Voeltz et al., 2006; Zurek 
et al., 2011). 

Cells were cotransfected with the general luminal ER marker 
BFP-KDEL, mCh-Rab7 to visualize endosome fission, and 
with or without high levels of Rtn4a-GFP. Compared to control, 
the Rtn4a-GFP cells had an altered ER morphology. As ex- 
pected, the peripheral ER tubules were highly elongated, un- 
branched, and luminal BFP-KDEL was mostly excluded from 
these elongated Rtn4a-GFP tubules (Figure 7A, compare 
Rtn4a-labeled top panels to control bottom panels). ER tubules 
were also significantly less dynamic with Rtn4a overexpression. 
This was measured quantitatively for all cells analyzed using the 
Pearson’s correlation coefficient of overlaid 5 |im square boxes 
from two frames taken 60 s apart (Figures 7B and 7C) (French 
et al., 2008). By this measurement, the time-lapse images of 
the ER in Rtn4a-GFP expressing cells were more colocalized 
over time because the peripheral ER moved less, thus resulting 
in a higher Pearson’s correlation (control, n = 22; Rtn4a, n = 20). 
We tested whether the change in ER morphology caused by 
overexpressing Rtn4a impacted endosome fission. We re- 
corded 2 min movies of hundreds of individual late endosomes. 
For each endosome, we determined whether it formed a 
budding domain during the course of the movie. For each endo- 
somal bud observed, we measured the size of the endosome, 
the maximum bud length, and whether it underwent fission (con- 
trol, n = 257; Rtn4a, n = 257). We found that altering ER shape 
did not significantly affect endosome size or bud length (Figures 
7D and 7E). However, we found that the number of buds that un- 
derwent fission was significantly reduced with Rtn4a overex- 
pression (Figure 7F). Thus, when ER membrane composition, 
shape, and dynamics are altered, the process of endosome 
fission is defective. 



DISCUSSION 

We have shown here that ER tubules form stable contact sites 
with sorting domains on early and late endosomes and that ER 
contact is a defining feature of endosome fission. These results 
are strikingly analogous to previous data showing that ER 
contact sites define the position of mitochondrial constriction 
and division (Friedman et al., 2011). Together, these reports 
demonstrate that the ER has an important and nontraditional 
role in the cell: to regulate the dynamics and biogenesis of at least 
two unique cytoplasmic organelles. How could an ER contact 
site regulate the constriction and/or fission of another organelle? 
We favor three possibilities. First, contact sites could provide a 
platform for the recruitment of necessary factors from each 
organelle that are required to drive fission. Second, because 
the ER is the site of synthesis for most phospholipids, it is 
possible that direct translocation of lipids from the ER into the en- 
dosomal membrane at contact sites generates a region of high 
lipid membrane curvature that would promote fission. The ER 
is also home to lipid modifying enzymes that may transform lipids 
on the endosomal membrane to similarly drive localized changes 
in membrane curvature. A third favored model is that ER stores 
could provide high local concentrations of Ca^"^ required to acti- 
vate a Ca^'^-dependent activity on the endosome and regulate 
fission. Indeed, these mechanisms are not mutually exclusive 
and may all contribute to organelle fission at contact sites. 

From an evolutionary point of view, one might think the ability 
of the ER to divide neighboring organelles would be advanta- 
geous for expedited ER dynamics within the cell or for whole- 
cell rearrangements that occur during mitosis. Both are daunting 
tasks considering the density of the organelles in the intracellular 
environment. However, the position of the ER at the site of fission 
appears to be purposeful for fission of endosomes as well as for 
mitochondria. We have observed that the ER settles to form sta- 
ble contact sites with the divide between sorting compartments 
prior to fission. Likewise, on mitochondria, the ER is also pur- 
posefully localized to the position of nucleoid segregation prior 
to mitochondrial division (Murley et al., 2013). 

How ER tubules bind to fission sites on endosomes (and even 
on mitochondria) is not known and remains an open and 



Figure 4. Late Endosome Division Occurs at ER Contact Sites 

(A) A Cos-7 cell expressing mCh-Rab7 (late endosome) and GFP-Sec61 p was pulse-labeled with EGF conjugated to Alexa Fluor 647 (cargo in blue). 

(B) Magnified image of the region boxed in (A) shows an example of late endosome fission. Merged images show the relative location of Rab7, EGF, and the ER, as 
indicated over time. See also Movie S5. 

(C) Traced outline and the corresponding line-scan analysis of relative FI through the equator of the dividing endosome shown in (B). Relative FI of Rab7, ER, and 
EGF were performed for time points: t = 0 s (Pre-ER), t = 35 s (Pre-Fission), and t = 40 s (Post-Fission). Note that a dynamic ER tubule is recruited to the position 
and timing of endosome constriction and fission (compare position of the ER tubule marked by an arrow at t = 0 s to t = 20 s in B and C). 

(D) Percent of late endosome division events that co-localize with ER tubules (n = 29 from 24 cells). 

(E) Image of a Dnm2-depleted cell expressing mCh-Rab7 (red) and GFP-Sec61 p (green) and pulse-labeled with EGF conjugated to Alexa Fluor 647 (blue). 

(F) Magnified image of box in (E) shows an elongated endosome that moves until it becomes associated with an ER tubule (white arrow) at the site of endosome 
division (blue arrow). The location of the Rab7 endosome bud (yellow arrowhead) is shown. See also Movie S6. 

(G) The percent of late endosome fission events that colocalize with ER tubules in the absence of dynamin-2 (out of 12 events from eight cells). 

(H) Immunoblot analysis shows efficient depletion of Dnm2 in cells transfected with Dnm2 siRNA (right lane) relative to control (left lane). 

(I) The endosome image surface covered by ER tubule crossing was measured for all 12 fission events. In example shown, the top row shows indicated 
fluorescence markers and bottom row shows thresholded images. 

(J) Table summarizes predicted frequency of ER-marked LE fission (21.24% based on coverage) versus the actual frequency of ER-marked fission (100%). 
***p < 0.001 , Fisher’s exact test. Scale bars represent 5 ^im in (A) and (E) and 1 i^m in (B) and (F). 

See also Figure S2. 
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Figure 5. The ER Is Recruited to FAM21 -Marked Sorting Domains Prior to Fission 

(A) Merged image of a live Cos-7 cell expressing GFP-Rab7, BFP-Sec61 p, and mCh-FAM21 that was also pulse-labeled with EGF conjugated to Alexa Fluor 647 
(cargo in blue). 

(B) Magnified time-lapse images of an endosome in (A) shows the tip of an ER tubule tracking with a FAM21 punctum at the bud of a late endosome (yellow 
arrowhead). See also Movie S7. 

(C) The number of FAM21 puncta on late endosome buds that maintain contact with ER over a 2 min time course (165 puncta from 31 cells). 

(D) Merged image of a live Cos-7 treated as in (A). 

(E) Magnified images of boxed region in (D) shows a late endosome undergoing fission. A bud labeled by FAM21 extends from the late endosome and undergoes 
constriction and division. An ER tubule (follow white arrow) is recruited to the FAM21 bud just before fission (at the blue arrow). The location of the exiting Rab7 
endosome bud is marked by a yellow arrowhead. See also Movie S8. 

(F) Line-scan analysis of endosome shown in E relates the timing and position of ER tubule recruitment relative to Rab7, FAM21 , and EGF. 

(G) The percent of FAM21 -marked late endosome division events that colocalize with ER tubules during cargo segregation, (n = 36 from 31 cells). Scale bars 
represent 5 |im in (A) and (D) and 1 |im in (B) and (E). 

See also Figures S3 and S4. 
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important question to answer. Although several partners have 
been identified that can bridge the apposing membranes, 
including PTP1B-EGFR, VAP-A-ORP1L, and STARD3-ORP1L 
(Alpy et al., 2013; Eden et al., 2010; Rocha et al., 2009), none 
have been shown to be required for maintenance of contact. 
Thus, these could all be independent tethers for independent 
functions or these complexes could function at sites that are 
tethered by yet unidentified factors. There are likely multiple 
points of contact between the ER and endosomes that could co- 
ordinate functions. For example, previous immunoelectron mi- 
croscopy has also shown colocalization of EGFR and PTP1B 
at the ER-endosome interface (Eden et al., 2010; Haj et al., 
2002). By live confocal microscopy, we also find that EGF within 
endosomes frequently tracks with the position of ER contact 
(Figure S5A). Therefore, we tested if ER-marked endosome 
fission occurs at the same contact sites as EGF clustering. We 
found that EGF puncta localized to less than half of the ER- 
marked fission events (Figure S5B). These data demonstrate 
that multiple functional and structural contact sites exist be- 
tween the ER and endosomes to coordinate different functions 
during cargo sorting and fission. The machinery at ER-marked 
endosome fission sites could be especially difficult to identify 
because, in many examples, these contacts are established 
immediately prior to the fission event and thus are likely to be 
very transient. The identification of this transient machinery will 
be a next major goal. 

EXPERIMENTAL PROCEDURES 
Plasmid Construction 

GFP-Sec6ip (Shibata et al., 2008), BFP-Sec6ip (Zurek et al., 2011), 
mCh-Sec6ip (Zurek et al., 2011), mCh-Rab5B (Friedman et al., 2010), 
BFP-Rab5B (Friedman et al., 2013), Rtn4a-GFP (Shibata et al., 2008), and 
mCh-aTubulin (Friedman et al., 2010) were previously described. GFP- 
Rab4B was a gift from A.R. English (University of Colorado-Boulder) and 
was generated by PGR amplifying human Rab4B (National Center for Biotech- 
nology Information [NCBI] accession number NM_016154.4) into Bglll/Kpnl of 
pAcGFPI-CI (Clontech). GFP-Rab5B was a gift from J.R. Friedman (Univer- 
sity of California-Davis) and was generated by PCR amplifying human 
RabSB (NCBI accession number NM_002868.3) and cloning it into the Xhol/ 
BamHI sites of pAcGFPI-CI (Clontech). To generate GFP-Rab7A, human 
Rab7A (NCBI accession number NM_004637.5) was PCR amplified and 
cloned into Xhol/Hindlll sites of the pAcGFPI-CI vector. mCh-Rab7A was 
made like GFP-Rab7A but mCh was cloned into the Nhel/Xhol sites of 
pAcGFPI-CI, replacing the GFP. mCh-FAM21and BFP-FAM21 were gener- 
ated by subcloning mCh- or BFP- into the BamHI and Mlul sites while removing 
HA-YFP from the shFAM21/HA-YFP-FAM21 rescue vector given to us by Dr. 
Daniel Billadeau described in Gomez and Billadeau (2009). GFP-TfR was 
generated by Dr. Lois Greene. 

Cell Growth and Plasmid DNA Transfections 

Cos-7 cells (ATCC) were grown in DMEM media (Invitrogen) supplemented 
with 10% FBS and 1% penicillin/streptomycin. Cells were seeded in a 60 x 
15 mm dish at 5.0 x 10^ cells ~16 hr prior to transfection. Cells were trans- 
fected ~24 hr prior to imaging with plasmid DNA in Opti-MEM media (Invitro- 
gen) with 12.5 |al of Lipofectamine 2000 reagent (Invitrogen) according to the 
manufacturer’s instructions. After -^5 hr of transfection, cells were seeded in 
35 mm glass-bottom microscope dishes (MatTek) at 2.0 x 10^ cells. Cells 
were imaged in 37°C Opti-MEM. For all experiments, the following amounts 
of plasmid DNA were transfected into cells for experiments: 500 ng/ml 
BFP-Sec6ip; 250 ng/ml mCh-Sec6ip and GFP-Sec6ip; 25 ng/ml GFP- 
Rab4; 50 ng/ml BFP-Rab5 and GFP-Rab5; 125 ng/ml mCh-Rab5; 20 ng/ml 



mCh-Rab7; 250 ng/ml mCh-FAM21; 500 ng/ml BFP-FAM21; 165 ng/ml 
GFP-TfR; 500 ng/ml Rtn4a-GFP; 250 ng/ml BFP-KDEL; and 62.5 ng/ml 
mCh-aTubulin. 

RNAi T ransfection and Western Blot 

Dynamin-2 was depleted using an ON-TARGETplus Human DNM2 (1785) 
siRNA SMARTpool (Thermo). WASH was depleted using ON-TARGETplus 
Human WASH1 (100287171) siRNA SMARTpool (Thermo). Cells were seeded 
in a 60 X 1 5 mm dish at 5.0 x 1 0^ cells -^1 6 hr prior to first transfection. Cells 
were first transfected ~72 hr prior to imaging with 12.5 |al Dharmafect (Thermo) 
in DMEM with 10% FBS and 25 nM RNAi oligonucleotides or 25 nM Silencer 
Negative Control #1 siRNA (Ambion AM4635). After ~6 hr of transfection, cells 
were washed and media was replaced with DMEM supplemented with 10% 
FBS and 1 % penicillin/streptomycin. Cells were transfected again ~24 hr prior 
to imaging with plasmid DNA as described before with the addition of 25 nM 
RNAi oligonucleotides or 25 nM Silencer Negative Control #1 siRNA. After 
--5 hr of transfection, cells were seeded in 35 mm glass-bottom microscope 
dishes (MatTek) at 2.0 x 10^ cells. Cells were imaged in 37°C Opti-MEM. 

Whole cell lysates of Cos-7 cells were resuspended in Laemmli sample 
buffer, boiled for 10 min, separated by SDS-PAGE, and transferred to a 
PVDF membrane. Primary antibody concentrations were used as follows: 
Rab7 (Cell Signaling 99367S) 1:1,000; Dynamin-2 (Abeam ab3457) 1:15,000; 
WASH1 (gift from D. Billadeau) 1:6,000; GAPDH (Sigma-Aldrich G9545) was 
used as a loading control at 1 :20,000. HRP-conjugated goat anti-rabbit sec- 
ondary antibody (Sigma-Aldrich) was used at 1 :3,000, and signal was detected 
with SuperSignal West Pico Chemiluminescent Substrate (Thermo). Band 
density was estimated using Imaged (NIH) Gel Analyzer. 

Confocal Microscopy 

Live-cell imaging was performed with an inverted fluorescence microscope 
(TE2000-U; Nikon) equipped with an electron-multiplying charge-coupled 
device (CCD) camera (Cascade II; Photometries) and a Yokogawa spinning- 
disk confocal system (CSU-Xm2; Nikon). Images were taken with a 100 x nu- 
merical aperture 1 .4 oil objective (Nikon). While imaging, live cells were kept at 
37°C in a live-cell incubation chamber (Pathology Devices). Images were ac- 
quired with MetaMorph 7.0 (MDS Analytical Technologies), analyzed, merged, 
and contrasted using Imaged, as well as contrasted and converted to 400 dpi 
using Photoshop (Adobe). Scale bars were generated using Imaged. Supple- 
mental videos were generated using Imaged. 

Endosome Fission and ER Coverage Anaiysis 

Cos-7 cells expressing markers for ER and endosomes were visualized live 
by confocal fluorescence microscopy every 5 s for 2 min in a single focal 
plane. Fission events were counted if they were in regions of the cell with 
resolvable ER tubules. Association of ER tubules was classified as adjacent, 
crossing, or not associated. Line-scans were performed using Imaged by 
drawing a line the width of the budding compartment along the length of 
the entire endosome. Fission of dynasore-induced elongated endosomes 
(Figures 3B-3D) and Dynamin-2 siRNA-induced elongated endosomes (Fig- 
ures 4E-4G) was analyzed if ER coverage was measured as <35% of the total 
endosome area. 

To predict the frequency of ER at the site of fission if fission could occur at 
any ER crossing along the endosome, we measured the area of crossing ER 
tubules versus total endosome area from the image immediately preceding 
endosome fission for all events. The area was measured using the adaptive 
threshold plugin for Imaged (created by Qingzong Tseng) which converts 
the 8-bit images to binary form defining the ER or endosome structure. We 
found the percent of endosome area covered by crossing ER tubules by 
comparing the number of pixels positive for ER signal to total number of pixels 
from the endosome area. We only selected the regions of ER that crossed the 
elongated endosome, omitting any ER tubules that ran parallel to the 
endosome. 

Fluorescent Cargo Treatment and Imaging 

Prior to imaging, cells were incubated in 37°C growth media containing either 
1 |ag/ml EGF conjugated to Alexa Fluor 647 (Invitrogen) or 50 |ag/ml transferrin 
conjugated to Alexa Fluor 647 (Invitrogen) for 5-10 min. Then, cells were 
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Figure 6. Model of Endosome Sorting and Fission Factors 

(A) A live Cos-7 cell expressing mCh-Rab7, GFP-Sec61 p, and BFP-FAM21 . Endosome fission occurs after a dynamic ER tubule contacts the endosome at a 
constricted sorting domain. 

(legend continued on next page) 
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Figure 7. ER Structure and Dynamics Affect Endosome Fission 

(A) Live Cos-7cells expressing BFP-KDEL and mCh-Rab7 with (top panei) or without (bottom panei) Rtn4a-GFP. Note that Rtn4a-GFP tubuies are iess branched 
(top first panei) and exciude KDEL from the periphery (top second panei). 

(B) Zoom from (A) shows overiay of t = 0 s (red) with t = 60 s (green) to show ER movement over time “dynamics.” 

(C) Graph of average Pearson’s correiation coefficient shows greater coiocaiization and therefore fewer dynamic ER movements for Rtn4a expressing ceils 
(control, n = 22 cells; Rtn4a, n = 20 cells; ***p < 0.001 , two-tailed t test). 

(D) Graph showing no significant difference in endosome diameter between control and Rtn4a expressing cells (control, n = 216 endosomes; Rtn4a, n = 215 
endosomes). 

(E) Graph showing no significant difference in endosome bud length between control and Rtn4a expressing cells (control, n = 257 buds in 22 cells; Rtn4a, n = 257 
buds in 20 cells). 

(F) Graph showing a significant (**p < 0.01 , two-tailed t test) decrease in the percent of endosome buds from (E) that undergo productive fission in Rtn4a 
expressing cells. Scale bars represent 5 |im in (A) and 1 |im (B) and (G). Error bars represent SEM. 



washed two times in 37°C Opti-MEM and imaged for less than 30 min to 
localize transferrin or EGF to early endosomes or after 30 min to localize 
EGF to late endosomes. 

Dynasore Treatment and Imaging 

Cells were passaged and transfected as described above. Cells were then 
washed two times and incubated in serum-free, antibiotic-free DMEM contain- 
ing 80-160 i^M dynasore (Sigma-Aldrich) for 30 min prior to imaging. After 
locating cells containing tubular endosomes by confocal microscopy, 10% 
FBS was added to inactivate dynasore thereby preventing adverse effects 
and restoring fission. 

To label cargo in dynasore-treated cells, cells were simultaneously incu- 
bated with either 1 |xg/ml EGF conjugated to Alexa Fluor 647 (Invitrogen) or 
50 |ag/ml transferrin conjugated to Alexa Fluor 647 (Invitrogen) for the duration 
of the dynasore incubation saturating the endosomes with cargo. Following 



the incubation, cells were washed two times with serum-free, antibiotic-free 
DMEM to remove excess cargo and the media was replaced with serum- 
free, antibiotic-free DMEM containing 80-160 ^iM dynasore for imaging. 

Diffusion Barrier Identification by Photobieaching 

Cos-7 cells were transfected with GFP-TfR, BFP-Rab5, and mCh-Sec61 p as 
described above. Cells were dynasore-treated as described above. Once a 
cell containing tubular endosomes was identified, a circular region of interest 
30 pixels in diameter was created using MetaMorph 7.0. Using the Mosaic Dig- 
ital Diaphragm System for Nikon TE2000 Microscope (Photonic Instruments), 
an image was captured prebleach and the region of interest was photo- 
bleached for 2 s (Uniblitz VCM-D1 shutter, Vincent Associates) using an Argon 
laser with a wavelength of 450-515 nm (National Laser). An image was taken 
postbleach followed by a 1 min movie of all expressed channels with 5 s 
intervals. 



(B) A Cos-7 cell expressing mCh-Rab7 and GFP-Sec61 p was pulse-labeled with EGF conjugated to Alexa Fluor 647.The ER tubule contacts the endosome, 
fission occurs at the point of contact, the budding domain leaves, maintaining ER contact. 

(C) The many factors implicated in endosome sorting and fission are indicated in the model pre- and post-ER recruitment. Following sorting, the ER establishes 
contact with the endosome at sites that are spatially and temporally linked to endosome fission. 

See also Figure S5. 
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To analyze the pattern of recovery and thus any barriers to diffusion, line- 
scans and kymographs were created using Imaged. We identified constric- 
tions marked by ER tubule crossings by drawing a line the width of the tubular 
endosome being analyzed and measuring the mean fluorescence intensity 
along the length of the tubular endosome. 

Measuring Dynamic ER Movements 

Live Cos-7 cells were imaged every 5 s for 2 min. Raw images taken 60 s apart 
were converted to 8 bit and overlaid using Imaged. The average Pearson’s cor- 
relation coefficient was determined for three peripheral 5 |im boxes for each 
cell using the PSC Colocalization plugin for Imaged (French et al., 2008). A 
higher degree of colocalization (Pearson’s correlation coefficient closer to 

I . 0) between the images over time is indicative of a less dynamic ER. 

Nocodazole Treatment and Imaging 

Dynamin-2 was depleted as described above to enrich for elongated endo- 
somes. Cells were passaged and transfected as described above. Cells 
were then washed and incubated in 37°C Opti-MEM prior to imaging. After 
locating cells containing elongated endosomes by confocal microscopy, a z 
stack was acquired. Then, 5 ^iM nocodazole (Acres Organics) was added for 
at least 15 min or until the majority of microtubules had depolymerized. A z 
stack was taken posttreatment to observe the position of the ER relative to 
endosome constrictions. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes five figures and eight movies and can be 
found with this article online at http://dx.doi.Org/10.1016/j.cell.2014.10.023. 
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SUMMARY 

The eukaryotic chaperonin TRiC (also called CCT) is 
the obligate chaperone for many essential proteins. 
TRiC is hetero-oligomeric, comprising two stacked 
rings of eight different subunits each. Subunit diversi- 
fication from simpler archaeal chaperonins appears 
linked to proteome expansion. Here, we integrate 
structural, biophysical, and modeling approaches 
to identify the hitherto unknown substrate-binding 
site in TRiC and uncover the basis of substrate re- 
cognition. NMR and modeling provided a structural 
model of a chaperonin-substrate complex. Muta- 
genesis and crosslinking-mass spectrometry vali- 
dated the identified substrate-binding interface and 
demonstrate that TRiC contacts full-length sub- 
strates combinatorially in a subunit-specific manner. 
The binding site of each subunit has a distinct, evolu- 
tionarily conserved pattern of polar and hydrophobic 
residues specifying recognition of discrete substrate 
motifs. The combinatorial recognition of polypep- 
tides broadens the specificity of TRiC and may direct 
the topology of bound polypeptides along a produc- 
tive folding trajectory, contributing to TRiC’s unique 
ability to fold obligate substrates. 

INTRODUCTION 

The health and integrity of the cellular proteome depend on mo- 
lecular chaperones, which through their distinct substrate speci- 
ficities and modes of action maintain protein homeostasis (Balch 
et al., 2008; Kim et al., 2013; Li and Buchner, 2013; Saibil, 2013). 
Among these, the eukaryotic chaperonin TRiC (for TCP-1 ring 
complex, also called CCT for chaperonin containing TOPI) is 
distinguished by its complex architecture and mechanism, which 
allow it to fold a subset of essential and topologically complex pro- 
teins, including cell-cycle regulators, signaling proteins, and cyto- 
skeletal components (Bigotti and Clarke, 2008; Kim et al., 2013). 
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TRiC/CCT is a large hetero-oligomeric ATP-dependent com- 
plex consisting of two eight-membered rings stacked back to 
back (Bigotti and Clarke, 2008; Marti et al., 2011; Spiess et al., 
2004). Each ring creates a central chamber where substrate 
polypeptides bind and fold. Unlike simpler archaeal chapero- 
nins, TRiC contains eight different paralogous subunits, named 
CCT1-CCT8, at fixed positions within each ring (Kalisman 
et al., 201 2; Leitner et al., 201 2). All subunits are structural homo- 
logs that consist of an ATP-binding equatorial domain and a sub- 
strate-binding apical domain linked by an intermediate domain 
(Bigotti and Clarke, 2008; Spiess et al., 2004) (Figure 1A). Each 
subunit also contains an apical segment that forms a lid over 
the cavity. An ATP-driven conformational cycle links TRiC-medi- 
ated folding to opening and closure of the lid, encapsulating the 
substrate in the cavity (Cong et al., 201 2; Meyer et al., 2003; Re- 
issmann et al., 2007, 2012). 

Understanding how TRiC recognizes its substrates has impor- 
tant implications for human health (Balch et al., 2008). TRiC inter- 
acts with approximately 10% of the proteome and is essential 
for viability (Yam et al., 2008). Mutations in CCT5 and CCT4 
are linked to sensory neuropathy (Bouhouche et al., 2006). Can- 
cer-linked proteins p53, von Hippel Lindau tumor suppressor 
(VHL), and STAT3 are also TRiC substrates (Kasembeli et al., 
2014; Trinidad et al., 2013), and mutations in the TRiC-binding 
sites of VHL lead to misfolding tumorigenesis (Feldman et al., 
1999, 2003). TRiC also suppresses aggregation and toxicity of 
Huntingtin in Huntington’s disease (Behrends et al., 2006; Kita- 
mura et al., 2006; Tam et al., 2006, 2009). TRiC is also important 
for folding viral proteins and required for replication of important 
human pathogens, including HCV and HIV (Inoue et al., 2011; 
Zhou et al., 2008). In HIV, TRiC interacts with proteins Gag, Vif, 
and p6 (Hong et al., 2001; Jager et al., 2012). 

The unique architecture and mechanistic features of TRiC set 
it apart from other chaperones. The diversification of subunits in 
TRiC is likely central to understand why many essential proteins, 
such as actin, Cdc20, and Cdhi , can only be folded with assis- 
tance from TRiC (Marti et al., 201 1 ; Spiess et al., 2004). Despite 
their extensive conservation in the ATP-binding domains, TRiC 
subunits have widely divergent functions within the ATP-driven 
cycle (Reissmann et al., 2012). Additionally, the surface proper- 
ties of the different subunits result in an asymmetric distribution 
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Figure 1. Kinetic Analysis of Substrate Motif Recognition by TRiC Apical Domains 

(A) TRiC/CCT subunit domain architecture. 

(B) Substrate polypeptides bind multiple TRiC subunits through discrete motifs. 

(C) Previously characterized subunit-substrate motif interactions. Top: CCT3 binds retroviral proteins p6 from HIV and p4 from MMPV. Bottom: CCT 1 binds Box1 , 
a short linear element from VHL. 

(D) Kinetic analysis of cognate (C) and noncognate (NC) interactions between purified apical domains and substrate-derived motifs. 

(E) Summary of apparent kinetic parameters for cognate and noncognate interactions, kon, blue bars; koff, orange bars. 

(F) Summary of apparent binding constants for cognate and noncognate interactions, Kd, red bars. 



of electrostatic charges within the folding chamber (Leitner et al., 
2012 ). 

The principles driving TRiC substrate recognition are poorly 
understood. In vivo, TRiC folds a subset of cellular proteins, sug- 
gesting a degree of specificity; however, its substrates are func- 
tionally and structurally diverse, indicating the potential to bind a 
broad array of proteins. The apical domains of each TRiC subunit 
are thought to recognize different motifs in substrates (Spiess 
et al., 2004; Spiess et al., 2006) (Figures 1A and IB). However, 
to date, no precise structural or sequence rules for TRiC-sub- 
strate binding have been identified. We here integrate biophysi- 
cal and computational structural biology approaches with chem- 
ical crosslinking and mass spectrometry (XL-MS) to define the 
basis of TRiC-substrate recognition. We find that unique sub- 
unit-specific patterns of polar and hydrophobic residues underlie 
the distinct substrate binding properties of each subunit in the 
complex. The diversification of TRiC subunits thus provides a 
modular menu of binding specificities that allows for combinato- 
rial recognition of substrate polypeptides. This likely contributes 



to TRiC’s unique ability to fold structurally diverse and topologi- 
cally complex substrates. Evolutionary analyses further suggest 
that diversification of TRiC subunits from its simpler archaeal an- 
cestors enabled the expansion of eukaryotic genomes to acquire 
proteins with novel folds and functions. 

RESULTS 

Kinetic Analysis of Substrate Motif Recognition by TRiC 
Apical Domains 

To understand the molecular basis of this recognition specificity, 
we exploited substrates where the cognate CCT subunit and 
the relevant substrate motif have been identified (Figure 1C). 
The 54 amino acid-long HIV protein p6, and the related protein 
p4 from MPMV, associate directly with subunit CCT3 of TRiC 
(Hong et al., 2001). A short 6-9 amino acid-long hydrophobic 
motif in VHL, called Boxi, contacts subunit CCT1 (Spiess 
et al., 2006). Importantly, the isolated recombinant apical do- 
mains of each TRiC subunit retain the ability to bind substrates 
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and substrate-derived motifs with the specificity of the same 
subunits within the intact complex (Spiess et al., 2006; Tam 
et al., 2006, 2009). 

We used purified HIV-p6 (herein p6) and VHL-Box1 (herein 
Box1) to examine the association of TRIG apical domains of 
CCT1 (herein ApiCCTI) and CCT3 (ApiCCTS) (Spiess et al., 
2006) with cognate and noncognate substrate-recognition motifs 
(Figure 1 D). A surface plasmon resonance (SPR)-based assay 
measured association and dissociation kinetics for ApiCCT-sub- 
strate pairs (Figures 1 E and 1 F and Figure S1 available online). 
Binding kinetics of immobilized VHL-Box1 and HIV-p6 to their 
cognate and noncognate ApiCCT binding partners were moni- 
tored by SPR over a range of concentrations (Figures 1D-1F 
and S1A-S1D). Apparent association and dissociation rates 
(Figure 1E) and binding constants (Figure 1F) were calculated 
from the sensograms (Figure S1). These indicated that the ratio 
of association over dissociation rates, i.e., the overall affinity, 
was higher for the cognate ApiCCT-substrate pairs (Figure 1E), 
consistent with the specificity of these motifs for these subunits 
within the TRIG complex. The measured on-rates, determined at 
approximately 10^ s“\ were markedly slower than diffu- 

sion-controlled binding (Figure 1E, blue bars) but consistent 
with the relatively slow substrate-binding kinetics of TRIG (Melki 
et al., 1997). Gognate interactions exhibited slower dissociation 
kinetics than noncognate interactions (Figures IE and S1A- 
S1D). Both association and dissociation rates contribute to 
substrate specificity for different subunits. For p6, the difference 
between cognate and noncognate interaction was largely driven 
by dissociation rates, whereas for Boxi , cognate and noncog- 
nate discrimination was a result of differential on- and off-rates 
(Figure 1 E). Of note, even the cognate interactions are relatively 
weak, with an overall affinity of approximately 0.25-0.5 |iM (Fig- 
ure 1 F). Accordingly, stable TRIG binding to most substrates will 
depend on multivalent recognition of several elements in the 
polypeptide by several subunits in the chaperonin. 

NMR Chemical-Shift Mapping of ApiCCTS Identifies the 
Substrate-Binding Interface 

We focused on the ApiCCTS and p6 interaction pair to gain 
a deeper structural understanding of TRiC-substrate recogni- 
tion. NMR-based chemical shift (CS) mapping was used to 
identify the substrate-recognition interface in ApiCCTS (Figures 
2 and S2). The FI Heteronuclear Single Quantum Coherence 
(FISQC) spectrum of ApiCCTS yielded well-resolved and dis- 
persed spectra, accounting for 142 of 167 peaks, covering 
85% of the protein sequence (Figures 2A and S2A and not 
shown). Standard triple-resonance backbone experiments, 
guided by specific amino acid labeling to anchor the sequence 
connectivities allowed us to successfully assign >85% of the 
peaks in the 2D HSQC spectrum, including all the ApiCCTS res- 
idues perturbed upon substrate addition (Figures S2A-S2D). 
Titration of increasing amounts of unlabeled p6 into ^^N-labeled 
ApiCCTS produced concentration-dependent shifts in a specific 
subset of peaks (Figures 2A and 2B); five peaks were strongly 
perturbed (>0.2 ppm), and another four peaks were perturbed 
weakly (>0.1 ppm; Figure 2B). Similar experiments were per- 
formed with p6-related protein p4 from M-PMV, which binds 
CCTS with lower affinity (Hong et al., 2001; data not shown). 



p4 addition affected the same residues in ApiCCTS as p6 did 
(data not shown), albeit to a lower extent. In contrast, no pertur- 
bations were observed upon addition of Boxi (data not shown). 

Given that Y247 in ApiCCTS (Figures 2A and 2B) was strongly 
perturbed upon substrate binding, we used "'^F-NMR on SF-tyro- 
sine-labeled ApiCCTS for an orthogonal assessment of the 
binding interface (Figures 2C and S2E). ID ^®F-NMR spectra of 
3F-tyrosine-labeled ApiCCTS revealed five discrete peaks, con- 
sistent with the five tyrosine residues in ApiCCTS (Figure 2C). 
Systematic tyrosine-to-phenylalanine point mutations assigned 
each peak to unique tyrosine residues (Figure S2E). Upon addi- 
tion of p6, one of the peaks exhibited a well-defined 0.2 ppm 
shift. In good agreement with our chemical-shift mapping, the 
perturbed peak corresponded to the "“^F-tyrosine peak of Y247 
(Figure 2C). 

Structural Model of ApiCCTS from NMR Backbone 
Chemical Shifts 

Guided by NMR-CS information (Figure 2Di), we used CS-Ro- 
setta and modeling to gain a structural understanding of 
ApiCCTS in the substrate-bound conformation (Shen et al., 
2009) (see Experimental Procedures and Figures S4D and 
S4E). The lowest energy models were comparable to the depos- 
ited ApiCCTS structure without substrate (Pappenberger et al., 
2002). Of note, our NMR-derived structural model resolved the 
apical protrusion, not resolved in the ApiCCTS crystal structure 
and shown to be intrinsically disordered in a previous NMR study 
of an archaeal apical domain obtained without substrate (Heller 
et al., 2004). 

Our NMR-derived model provided insight into the conforma- 
tional dynamics of the apical domain (Figure 2Dii). The regions 
of higher mobility in the structure included the helical protru- 
sion (Figures 2Dii and 2E, red) and to a lesser extent the flexible 
loop adjacent to helix 11 (Figures 2Dii and 2E, herein PL for 
proximal loop). The highly flexible helical protrusion is involved 
in formation of the closed lid (Heller et al., 2004), but its role in 
the open chaperonin conformation is not well understood. 
Interestingly, Y247, whose chemical shift was strongly per- 
turbed by substrate binding, is at the “hinge” between the flex- 
ible lid-forming protrusion and the apical domain. Y247 may 
participate in both substrate recognition and modulating the 
conformation of the lid protrusion for subsequent release 
when the closed lid forms. 

Mapping the substrate-induced CS-perturbations (Figure 2B) 
onto the ApiCCTS structure revealed a continuous and extensive 
interaction surface spanning three sets of secondary structure 
elements (Figures 2Diii, 2E, and 2F). The CCTS substrate interac- 
tion interface is primarily defined by a shallow groove formed be- 
tween helix 1 1 and the PL and comprises approximately 700 A^. 
The core of the substrate-binding site was relatively constrained, 
consisting of residues on the surface of relatively rigid helix 
11 (herein H11) (residues 296-306). The distal portions corre- 
sponded to more flexible elements, including the PL adjacent 
to H11 (residues 223-232) and the hinge connecting to the 
flexible lid-forming protrusion including Y247. Supporting this 
analysis, we obtained low-resolution diffracting crystals of the 
p6-ApiCCT3 complex, which, following model building and 
refinement, demonstrated an additional density in the same 
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Figure 2. NMR Chemical-Shift Mapping of ApiCCTS Identifies the Substrate-Binding Interface 

(A) 2D NMR spectra of ApiCCTS with increasing amounts of uniabeied p6. Biue, purpie, red, and orange, respectiveiy, show perturbations at 0-, 5-, 10-, 

and 20-fold excess p6 peptide. Insets show perturbed peaks. 

(B) Chemical-shift perturbation due to p6 binding mapped to the ApiCCTS sequence. 

(C) tyrosine spectra of ApiCCTS (black trace). Addition of p6 (red trace) produces a discrete perturbation of Y247. Right panel: ApiCCTS structure with Y247 in 
red and the remaining tyrosines in blue. 

(D) (i) Integration of NMR chemical-shift parameters with CS-Rosetta. (ii) C-a variability across a subset of low-energy ApiCCTS models highlights mobile regions. 
ApiCCTS shown as cartoon and colored according to C-a rmsd. (ill) p6-induced chemical shifts map to a contiguous ApiCCTS surface. Surface representation of 
ApiCCTS is colored according to the chemical shift from yellow (0) to red (0.2 ppm). 

(E and F) Substrate-binding region on ApiCCTS, defined by helix 11 (Fill), a loop (PL), and a residue at the hinge of the flexible loop (Y247): (E) Cartoon 
representation colored according to flexibility (Ca-rmsd as in Dii); (F) colored according to chemical properties of side chains: basic (blue), acidic (red), polar 
(white/gray), and nonpolar (yellow). Top: cartoon representation of binding site; bottom: simplified box schematic depicting amino acids on the substrate-binding 
site; upper line: Fill and protrusion hinge residue; lower line: PL sequence. 



region of the apical domain identified through NMR (L.A.J., R. 
McAndrew, J.F., and P. Adams, unpublished data). 

Previously characterized chaperone-binding sites, such as 
those of Hsp70 and the bacterial chaperonin GroEL, rely pre- 
dominantly on the recognition of hydrophobic determinants 
(Ashcroft et al., 2002; Chen and Sigler, 1999; Hua et al., 2001; 
Rudiger et al., 1997; Swain et al., 2006). In contrast, the sub- 
strate-binding site of ApiCCTS contained a mixture of hydropho- 
bic and polar residues (Figure 2F). In addition to Y247, H1 1 con- 
tributes hydrophobic (L299, M305), polar (Q301), and charged 
(R306, D298, H302) residues, whereas the PL immediately below 



presents a contiguous stretch of basic residues (H226, R228, 
R230, R231 ) and a single hydrophobic (M229) residue (Figure 2F, 
schematically represented by the box diagram in Figure 2F, 
bottom). Such a shallow, extensive binding surface comprising 
hydrophobic and polar residues is very different from the mostly 
hydrophobic substrate-binding sites of Hsp70 and GroEL. 

Mutational Analysis Links Chemical Properties of 
Substrate-Recognition Site to Binding Kinetics 

We next designed and purified a large unbiased panel of alanine 
substitutions in ApiCCT3, comprising 31 surface-exposed 
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Figure 3. Mutational Analysis of CCT3 Links Chemical Properties of Substrate-Recognition Site to Binding Kinetics 

(A) Kinetic scheme of ApiCCT3:p6 interaction (kon in biue, koff in orange). 

(B-D) Exempiary SPR titrations of ApiCCTS variants interacting with an immobiiized p6 peptide, performed as in Figure 1 . (B) ApiCCT3-WT, (C) ApiCCT3-R230A, 
and (D) ApiCCT3-L299A. Association phase is in biue, and dissociation phase is in orange. Each titration (biack curves) inciuded an anaiyte diiution series of 50, 
25, 12.5, 6.25, and 3.125 i^M apicai domain. 

(E-H) Summary of fitted kinetic parameters for ApiCCTS mutants (see aiso Tabie S2 for detaiis). (E and G) Bar graphs showing the foid change reiative to WT 
apparent rates for (E) association rate kon (blue, kon.wj/kon.mut) and (G) dissociation rate koff (orange, koff,muAoff,wT) from p6 substrate. (F and H) Fold change in 
rates mapped onto the ApiCCTS structure highlight the clustering of surfaces based on Ala mutants perturbing binding kinetics through (F) association and (H) 
dissociation rates. ApiCCTS structure is colored according to fold change in kinetic contribution; blue is equal to WT, and green is perturbed as indicated. 

(I) Calculated apparent Kd for WT ApiCCTS and each alanine mutant. The apparent Kd of p6 for ApiCCTI , also measured by SPR, is included for comparison. 

(legend continued on next page) 
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residues in the 1 67 residue apical domain (Figures 3 and S3). The 
global impact of these mutations on p6 binding was determined 
with SPR (Figures 3 and S3 and data not shown). Of the initial 31 , 
10 mutations had no impact on p6 binding and were not exam- 
ined further, except for a few mutants, such as Y274A, chosen 
as a control (Figure S3). We determined kinetic binding parame- 
ters for 22 alanine mutants, including the one without effect as a 
control, by carrying out full titration series with SPR, followed by 
kinetic global fitting of the data (Figures 3B-3D, Table SI, and 
summarized in Figures 3E, 3G, and S3). This unbiased muta- 
tional analysis of ApiCCT3 independently confirmed the NMR- 
based identification of the substrate-binding site (Figure 2). 
Thus, mutation of residues perturbed by NMR mapping dramat- 
ically affected p6 binding (Figures 3E-3H), whereas mutation at 
control sites (e.g., Y274 to A274) did not (Figure S3). 

Our analysis reveals the kinetic underpinnings of TRiC-sub- 
strate recognition (Figures 3F, 3FI, and 31). Substrate association 
(kon) and dissociation rates (koff) were very differently affected by 
mutations at discrete positions in the binding site, clustered into 
two distinct regions (Figures 3F, 3H, and 31). The association ki- 
netics were predominantly perturbed by mutations of positively 
charged residues in the flexible PL (Figures 3F and 31). On the 
other hand, the contribution to the dissociation rate is distributed 
across both FI1 1 and the PL. Residues on H1 1 , particularly a mix 
of nonpolar, polar, and charged side chains, contributed pre- 
dominantly to the dissociation rate (Figures 3H and 31). In the 
PL, the arginine residues allow both charge-charge interactions, 
likely contributing to the association rates, as well as cation-pi 
and aliphatic chain interactions with nonpolar residues, which 
likely contribute to the dissociation rates. Thus, the mixed 
chemical nature of the CCT3-binding site, combining polar and 
hydrophobic residues, establishes a dual mode of substrate 
recognition (Figure 3J). The overall contribution of H11 and the 
PL to the binding constant is distributed across residues L299, 
Q301 , H302 and R228, R230, R231 , respectively. These findings 
resonate with studies of the interfaces between folded proteins, 
where a core of hydrophobic residues contributes to dissocia- 
tion rates and polar interactions at the periphery drive associa- 
tion and orientation (Bogan and Thorn, 1998; Clackson et al., 
1998). 

NMR Identification of the Chaperonin-Binding 
Determinants in the Substrate 

NMR CS-mapping identified next the chaperonin-binding site in 
the substrate (Figure 4). HSQC spectra of ^®N-labeled p6 

were assigned, and CS-NMR was used to model the peptide 
structure and identify determinants recognized by ApiCCT3 (Fig- 
ures 4A and S4A). Adding unlabeled ApiCCT3 to ^^N-p6 caused 
a concentration-dependent chemical shift perturbation in a sub- 
set of peaks (Figures 4A and 4B). These peaks mapped to the 
contiguous and highly conserved S41-N45 element at the p6 C 



terminus, consisting of both nonpolar and polar residues (S41- 
N45; Figure 4C). 

To determine the role of the S41-N45 element in chaperonin 
recognition, we generated and purified the penta-alanine substi- 
tution p6sLFGN = > AAAAA (herein p6mut)- To determine whether the 
SLFGN = > AAAAA mutation affected ApiCCT3 binding, we used 
NMR to examine the perturbation of the ^^N-ApiCCT3 spectra 
upon titration of unlabeled p6mut (Figure 4D). Mutation of the 
S41-N45 motif largely abrogated the p6-induced chemical shift 
perturbations in the ApiCCT3 spectrum, indicating that this 
element mediates chaperonin binding (Figure 4D). We next 
examined the role of S41-N45 in the interaction of p6 with intact 
TRiC/CCT (Figure 4E). Purified p6wT or p6mut, labeled with an 
N-terminal biotin tag, were incubated with mammalian cell ex- 
tracts, which contain the intact hetero-oligomeric TRiC complex 
(Figure 4E). Following biotin affinity isolation, the p6 interaction 
with endogenous TRiC was evaluated by immunoblot analysis 
(Figure 4E). As expected, p6 bound TRiC, whereas p6mut did 
not (Figure 4E). These orthogonal approaches support the 
conclusion that the S41 -N45 motif in p6 is indeed the TRiC bind- 
ing site. To further corroborate this conclusion, we examined the 
chemical shift perturbations induced upon incubation of ^^N p6 
with intact purified TRiC (Figure S4D). Indeed the same subset 
of p6 residues was affected upon addition of intact TRiC, indi- 
cating that the ApiCCT3-binding site of p6 mediates its interac- 
tion with TRiC. 

CS-Rosetta was next used to derive the solution structure of 
p6 (Figures 4Fi, S4B, and S4C). Analysis of the top-scoring 
models showed that p6 in aqueous solution contains a struc- 
tured helical element at the C terminus and a flexible polar region 
(Figure 4G). The C terminus of p6, containing S41 -N45, adopts a 
helical conformation (Figure 4Fi). The ApiCCT3-interacting resi- 
dues L42, F43, and N45 map to one face of the helix (Figure 4Fii). 
In contrast, the N terminus is highly dynamic as highlighted by 
the Ca-rmsd map (Figures 4Fi, S4B, and S4C). Circular dichro- 
ism (CD) measurements revealed only a very subtle decrease 
in helicity when p6 binds the chaperonin, consistent with the 
weak helicity observed in the NMR-guided structural model of 
p6 (Figures S4B and S4E). Interestingly, whereas the helical 
chaperonin-binding determinant has significant hydrophobic 
character, the flexible N terminus contains a series of acidic res- 
idues (Figure 4G). The structural model of p6 implies that these 
acidic residues may interact with the basic residues in the 
ApiCCT3 loop, providing a molecular rationale for the electro- 
static-driven association kinetics observed by SPR. 

Structural Model of the Chaperonin-Substrate Interface 

RosettaDock with CS-derived site constraints was employed to 
obtain a structural model of the ApiCCT3-p6 complex (Figures 5 
and S5). The lowest energy model was fully consistent with the 
NMR data of both p6 and ApiCCT3 (Figures 5A and 5B), and 



(J) Residues that contribute to association and dissociation rates are shown as spheres on a cartoon modei of ApiCCTS; perturbations affecting association rates 
(biue) ciuster in the PL; those affecting dissociation rates (orange) ciuster on H11; and the hinge is at the apicai protrusion, inset, surface representation of 
ApiCCT3-p6-binding site coiored according to amino acid chemicai property: basic in biue, acidic in red, poiar in white, and nonpoiar in yeiiow. 

(K) Differentiai kinetic contribution of two regions in ApiCCTS to substrate binding: charge-charge interactions between basic and acidic residues in biue and red, 
respectiveiy, controi association rates (R228, R230, R231 aiso contribute to off-rates), whereas a mix of nonpoiar, poiar, and Van der Waais interactions, shown in 
yeiiow, controi dissociation rates. 
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Figure 4. NMR Identification of the Chaperonin-Binding Determinants in the Substrate 

(A) NMR-based chemical-shift mapping strategy identifies the p6-interacting surface. 

(B) 2D NMR spectra of p6 with increasing amounts of unlabeled ApiCCTS. Orange, red, purple, and blue, respectively, show perturbations at 0-, 2.5-, 5-, 

and 10-fold excess ApiCCTS. 

(C) Chemical-shift perturbation mapped onto the p6 sequence colored by ApiCCTS perturbation. 

(D) Mutant p6mut (P^slfgn = > aaaaa) does not induce chemical-shift perturbations in the ApiCCTS spectrum observed with p6wr- Perturbations mapped on the 
ApiCCTS sequence and on the ApiCCTS surface, exactly as in Figure 2F. 

(E) p6mut mutation abrogates the binding to endogenous TRIG observed for p6wj- Biotinylated pOwr or p6mut was incubated with extracts from human HEK29S 
cells, affinity isolated via the biotin tag (IP). TRIG in the IP is visualized by western blot. Total: 1 % input to the IP. 

(F) Structural model of p6 from CS-NMR. (i) C-a variability of p6: cartoon representation colored according to variability across a subset of low-energy models, as 
indicated, (ii) Surface representation of p6 colored according to chemical-shift perturbation by ApiCCTS addition, as indicated. 

(G) Chemical properties of p6, colored according to amino acid properties: blue is basic, red is acidic, white is polar, and yellow is nonpolar. 



the interface agreed with all our experimental data, including CS 
perturbations (Figures 5C, 1 , and 3) and mutagenesis (Figures 2, 
3, and 5D). 

The ApiCCT3-p6 structure provides unprecedented detail on 
chaperonin-substrate binding (Figures 5A and 5B). p6 makes 
tight packing interactions with unique features in the ApiCCT3 
FI11 and PL region through the specific presentation of side 
chains (Figures 5A and 5B). The interface consists of two distinct 
regions, highlighting the dual nonpolar and polar nature of bind- 
ing and providing a rationale for the bipartite substrate-binding 
mode observed in the kinetic analyses. The interaction core is 



established by a mix of nonpolar and polar interactions, centered 
on FI11 (Figures 5B-5D; namely L299, FI302, M305, Q301, and 
Y247 in ApiCCT3 and L42, F43, and N45 in p6). A region of 
mostly electrostatic interactions is centered primarily on the 
PL, between positively charged residues in ApiCCT3 and acidic 
residues in p6 (Figure 5B). These charge-charge interactions 
both confer specificity for discrete elements in the substrate 
and serve to orient the substrate upon binding the apical domain 
(Figure 5B). Additional nonpolar contacts make close packing 
interactions with the aliphatic chains of lysine and arginine in 
the apical domain. As a result, of the 1073 A^ of buried surface 
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Figure 5. Structural Model of the Chaperonin-Substrate Interface 

(A and B) Structural model of ApiCCT3-p6. (A) ApiCCTS and p6 colored in gray and purple, respectively. Interfacial residues in stick representation are colored in 
teal and magenta. Inset, zoom-in of the interface at 90° rotation. (B) ApiCCTS substrate-binding interface in surface representation colored according to elec- 
trostatic potential; blue, red, and white are positive, negative, and neutral residues, respectively. Bound p6 is shown in teal with interfacial residues in stick 
representation. 

(C-E) Agreement between experimental data and structural model of ApiCCTS (surface) and p6 (ribbon): (C) Reciprocal chemical-shift perturbations in p6 (blue- 
green) and ApiCCTS (yellow-red) upon complex formation. (D) Mutagenesis perturbation of binding measured by apparent Kd. (E) Evolutionary conservation 
among orthologs of CCTS (yellow-red) and p6 (blue-green). 

(F-H) Structural model of ApiCCTI -Box1 . (F) ApiCCTI -Box1 complex in cartoon representation colored in gray and purple, respectively. Interfacial residues 
shown as sticks and colored in teal and magenta. Inset, zoom in ofthe interface at 90° rotation. (G) Electrostatic charge distribution of ApiCCTI substrate-binding 
site bound to the Box peptide. Surface representation of the ApiCCTI substrate-binding site colored according to electrostatic charge potential; blue, red, and 
white are positively, negatively, and neutrally charged residues, respectively. Bound Box1 in magenta with interfacial residues shown in stick representation. (H) 
Mutagenesis analysis of CCT1 residues required for Box1 binding. Affinity-ranking SPR experiments assessed binding of alanine ApiCCTI mutants for Box1. 
ApiCCTI residues that perturb binding are highlighted in red/orange on the cartoon putty representation. 
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Figure 6. Mapping the Contacts between the TRiC Hetero-oligomer and Full-Length Substrates 

(A) XL-MS approach to map TRiC-substrate contacts. 

(B) XL- MS-derived contact points between tubuiin, actin, and Gag and specific sites on TRiC. 

(C) Substrate crossiink sites mapped onto the surface representation of ApiCCTS. The substrate-binding interface in ApiCCTS is highiighted as in Figure 2F. The 
sites of substrate crossiinks shown as spheres are coiored green, cyan, and biue for CCT2, CCT6, and CCT7, respectiveiy. 

(D) Comparison of H11/PL substrate-binding region of subunits CCT2, CCT6, and CCT7 coiored according to amino acid properties; biue, red, and white 
correspond to basic, acidic, and neutrai amino acids. Bottom: schematic representation of binding region in box format as in Figure 2H, coiored as indicated. 

(E) TRiC crossiink sites mapped onto the Gag protein sequence. Verticai iines show sites where Gag crossiinks to the apicai domains of CCT6 (cyan), CCT2 
(green), and CCT7 (biue). The piot aiso maps the hydrophobicity (yeiiow) and poiarity (purpie) of Gag. 

(F) Gag and tubuiin crossiinks to TRiC mapped onto the foided substrate structures. (Top) Cartoon representation of N-terminai domain of Gag in red with fiexibie 
region containing the site of CCT6 crossiink (K110, as spacefiii) in cyan. (Bottom) Cartoon representation of tubuiin heterodimer: p-tubuiin in red and a-tubuiin in 
gray. Loops containing the two p-tubuiin-TRiC crossiink sites (in spacefiii: K58 in green to CCT2; K324 in cyan to CCT6) are shown in green and cyan. 

(legend continued on next page) 
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area, 762 correspond to nonpolar contacts and 31 1 to po- 
lar contacts. 

Integrating conservation across orthologs for ApiCCTS and p6 
sheds light on the potential coevolution of surfaces employed in 
chaperone-substrate interaction (Figure 5E). The core interface 
residues are conserved in both ApiCCTS as well as p6 variants 
across HIV clades. Conservation in this p6 region could respond 
to the requirement for TRiC interaction and/or interaction with 
orthogonal binding partners such as VPR and ALIX, which also 
bind in this p6 region (Salgado et al., 2009). 

We next employed a similar RosettaDock-based analysis to 
obtain a structural model for VHL-Boxi in a complex with 
the apical domain of CCT1 (Figures 5F, S5A, and S5B). The 
ApiCCTI-BoxI structural model placed Boxi at the same H11/ 
PL region of ApiCCTI where p6 binds ApiCCTS (Figure 5F). 
Boxi adopts an extended conformation upon binding (Fig- 
ure 5G). The side chains of L1 16 and W117 in Boxi, known to 
be critical for VHL binding to TRiC in vivo and in vitro (Feldman 
et al., 2003), pack between H11 and the PL in ApiCCTI (Fig- 
ure 5F, inset and Figure 5G). Interestingly, comparison of the 
substrate complexes of CCT1 and CCT3 (Figure 5B versus Fig- 
ure 5G) shows that the substrate-binding surface of CCT1 is 
more hydrophobic than that of CCT3, consistent with the higher 
hydrophobicity of Boxi over p6. We validated the ApiCCTI - 
Boxi structural model using ApiCCTI alanine-substitution muta- 
genesis followed by affinity measurements (Figures 5H and 
S5C). A set of 25 mutants in ApiCCTI were purified, and their 
interaction with Boxi analyzed by an affinity-ranking SPR 
approach (Figures 5H and S5C). Strikingly, the five ApiCCTI 
alanine mutants that most significantly perturbed VHL-Boxi 
binding mapped to the interface predicted by the structure (Fig- 
ure S5C). These data, together with previous analysis of Boxi 
residues required for TRiC binding (Feldman et al., 2003), vali- 
date the structural model for ApiCCTI -Boxi . We conclude that 
the groove formed between H1 1 and the flexible PL is the general 
substrate-recognition site of TRiC/CCT subunits. 

Mapping the Contacts between the TRiC Hetero- 
oligomer and Full-Length Substrates 

To extend our understanding of TRiC substrate recognition 
to full-length substrates, we used chemical crosslinking-mass 
spectrometry (XL-MS) to identify contact points between intact 
TRiC and three full-length physiological substrates: actin, 
tubulin, and HIV Gag (Figures 6 and S6). For actin and tubulin, 
the heterotypic TRiC crosslinks localized to substrate determi- 
nants previously implicated in TRiC binding by peptide arrays 
and mutagenesis (Table S2 and Figure 6B) (Hynes and Willison, 
2000; Ritco-Vonsovici and Willison, 2000; Rommelaere et al., 
1999). Both tubulin and Gag crosslinked to multiple TRiC sub- 
units through specific regions in the polypeptide (Table S2 and 



Figure 6B), consistent with a multivalent contact between TRiC 
and its substrates. 

Exploiting the conservation of general architecture of TRiC 
subunits, we mapped the location of substrate crosslinks to 
CCT2, CCT6, and CCT7 (Figure 6C, green, blue, and cyan 
spheres, respectively) onto the ApiCCT3 structure, highlighting 
its substrate-binding surface (Figure 6C, red surface). Strikingly, 
the substrate crosslinks are proximal to the apical domain 
substrate-binding interface between CCT3-p6 and CCT1- 
Boxl , validating this region as the general location of the sub- 
strate-binding site in all TRiC subunits. 

Comparing the chemical properties of the HI 1/PL region in 
subunits CCT2, CCT6, and CCT7 (Figure 6D) shows that the 
substrate-binding site of each subunit has a distinct pattern of 
hydrophobic and polar residues. Thus, the dual-recognition 
mode observed for CCT3-p6 is a general feature of TRiC-sub- 
strate recognition. Interestingly, analysis of the location of cross- 
link sites in the substrate primary sequence (Figure 6E for Gag; 
see also below; Figure S6C) indicated that chaperonin contact 
points within the polypeptide are close to the boundary between 
a nonpolar and polar region (Figure 6E, yellow trace: hydropho- 
bicity; purple trace: polarity). The distinctive combination of polar 
and hydrophobic elements in both the substrate and each chap- 
eronin subunit HI 1/PL region may underlie subunit-specific in- 
teractions (Figures 6 and S6C). 

We next mapped the TRiC-crosslink sites onto the folded 
structures of actin, tubulin, and Gag (Figure 6F). The chapero- 
nin-contact sites are proximal to both a structured hydrophobic 
region, either helix or strand, and a more unstructured polar loop 
(Figure 6F, top panel, N terminus of Gag; bottom panel, tubulin, 
not shown for actin). The tubulin crosslinks map to two surface 
loops at the tips of the N-terminal and C-terminal lobes of the 
protein (Figure 6F, bottom panel). The TRiC-binding sites overlap 
with the interface of the tubulin heterodimer, indicating that 
folded and assembled p-tubulin cannot bind to TRiC (Figure 6F, 
bottom). The binding site for tubulin assembly factor Rbl2/ 
Cofactor A (CoA), which acts directly downstream of TRiC in 
tubulin assembly (Tian and Cowan, 2013), also overlaps with 
the Tub2 site of crosslink to CCT6 (You et al., 2004). The overlap- 
ping tubulin-binding sites for TRiC and CoA suggest possible 
mechanisms for Tub2 release from TRiC and indicate that the 
chaperonin protects this oligomerization surface from inappro- 
priate intra- and intermolecular interactions. 

To understand how the same subunit can bind distinct sub- 
strate motifs, we used the crosslinking information as a physical 
constraint to generate models of the CCT2-substrate interaction 
with Gag and tubulin (Figures 6G and S6A-SC). The lowest en- 
ergy models placed both substrate-derived peptides in the 
CCT2-binding site formed by H11 and the PL, even though 
the starting distance constraint, i.e., the site of crosslink, was 



(G) XL- MS-derived structural models of CCT2 apical domain-substrate interaction with substrate elements from crosslink sites of Gag (top) and tubulin (bottom). 
Apical domain-peptide complexes are colored in gray and purple, respectively. Interfacial residues in stick representation are colored teal for the apical protein 
and magenta for the peptide. 

(H) Putative topological description of the TRiC-bound substrate (dashed line) for Gag (top) and tubulin (bottom). The open state of TRiC is shown in gray, with 
subunits CCT2, CCT6, and CCT7 colored green, cyan, and blue, respectively. The Ca lysines involved in the crosslink are shown as spheres. The immediate 
proximity of the crosslink site on the peptide sequence is colored according to amino acid properties; yellow, white, blue, and red are nonpolar, polar, basic, and 
red, respectively. 
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distal from this site. Gag and tubulin bind in different configura- 
tions to the same apical domain of CCT2. Comparing all the 
data and structural models obtained here for different apical 
domain-substrate complexes reveals common rules for TRiC 
recognition and specificity. The shallow groove created by 
H11 and the PL allows flexibility in binding, allowing the same 
apical domains to bind different substrates with no sequence 
similarity (Yam et al., 2008). H11 and the PL provide the apical 
domain surface for substrate recognition through a combination 
of polar and hydrophobic interactions. The specific polar-hydro- 
phobic pattern of both apical domain and substrate serve to 
provide specificity and orient the substrate to dictate the bind- 
ing topology. 

Most excitingly, the XL-MS analysis provided a topological 
description of the substrate when bound to TRiC (Figure 6H). 
Subunit-specific contacts provide anchors that determine a 
global configuration of substrate polypeptides bound to TRiC; 
for Gag, the polypeptide is stretched across the open complex, 
whereas for tubulin, the two contact points at the tips of its two 
lobes are at antipodal positions of the ring. Taken together, these 
data indicate that binding to TRiC orients and restricts the global 
topology of the bound substrate, perhaps allowing the domains 
to start folding while associated with the chaperonin. 

DISCUSSION 

How TRiC/CCT discriminates between non-native substrates 
and their folded counterparts is intriguing in view of its obligate 
requirement for folding a subset of cellular proteins that share 
no sequence or structural similarities. By defining the structural 
basis of substrate recognition, we begin to understand how 
subunit diversification enabled TRiC to balance the plasticity 
required to recognize a broad array of substrates with the spec- 
ificity required to assist their folding. 

Implications for Substrate Selection and Foiding in the 
Hetero-oligomeric Chaperonins 

Mapping the substrate binding in the apical domains of TRiC to a 
groove between H11 and the PL has fundamental implications 
for the mechanics of substrate folding (Figure 7A). Within the 
intact complex, the interaction with each subunit relies on a 
recognition code integrating polar and hydrophobic contribu- 
tions, which enables combinatorial substrate recognition (Fig- 
ure 7Ai). Polar contacts orient the substrate locally upon binding, 
and the distribution of subunit-specific substrate interactions 
stipulates the global topology of the TRiC-bound polypeptide. 
This may direct folding of TRiC-bound substrates along a 
preferred pathway (Figure 7Aii). Our mapping of the substrate- 
binding site provides a compelling mechanism of substrate 
release upon ATP-dependent closure (Douglas et al., 2011) (Fig- 
ure 7Aiv). ATP-induced closure of the lid brings the PL region in 
one apical domain into direct contact with a loop in the neigh- 
boring subunit, termed RLS (release loop of substrate) (Fig- 
ure 7Aiii), which mediates substrate release locally, through 
ATP-induced contacts between adjacent apical domains (Doug- 
las et al., 2011) (Figure 7Aiv). This mechanism of release is well 
suited to the low affinity of each individual apical domain-sub- 
strate interaction, as it permits the local displacement of the 



substrate from the apical domain by the ATP-induced proximity 
of the RLS. Because ATP binding and hydrolysis function within 
the ring is asymmetric (Reissmann et al., 2012) (Figure 7A, dark 
gray: high ATP affinity; light gray: low ATP affinity), it is possible 
that the substrate is released sequentially during the conforma- 
tional cycle (Figure 7A, brackets). The particular dissociation 
rates for a given subunit will determine when specific regions 
of the polypeptide are released from their binding sites into the 
folding chamber. By allowing certain subdomains to fold first, 
TRiC may promote productive folding trajectories. 

Principles Driving the Diversification of Recognition and 
Specificity in TRiC Subunits 

NMR, mutagenesis, modeling, and XL-MS indicate that all TRiC 
subunits contact substrates through the same region in their 
apical domains. This region is evolutionarily conserved across 
orthologs (i.e., across all CCT5 from eukaryotes. Figures 7C, 
S7A, and S7B) but diverges across TRiC/CCT paralogs (Fig- 
ure 7B), suggesting a distinct and important function in each 
subunit. 

Chaperonin-binding sites balance plasticity and specificity 
in substrate recognition through two modular elements that 
discretely control binding and specificity: charged and polar 
residues contribute to enhancing on-rates and hydrophobic 
residues contributing to decreased off-rates. The polypeptide- 
binding platform in the HI 1/PL region combines a rigid helical 
element and a flexible loop. Substrate interaction involves 
burial of nonpolar residues in the groove formed by the H1 1 he- 
lix/PL loop region. The PL loop is highly variable among subunits, 
in terms of both chemical properties and length (Figures S7B and 
7C). Providing conformational flexibility to PL in the polypeptide- 
binding groove may enable recognition of a larger set of sub- 
strates. Among all subunits, CCT2 presents the most nonpolar, 
classical “chaperone-like,” binding surface (Figure 7C); interest- 
ingly this subunit is crosslinked to all full-length substrates 
examined. 

The shallow nature of the chaperonin-binding groove allows a 
subunit to recognize motifs with different features, providing 
plasticity in binding, as shown for CCT2 (Figure 6). The low affin- 
ity for a single substrate-apical domain interaction is consistent 
with an avidity-driven interaction, whereby multiple discrete 
low-affinity contacts to different subunits mediate stable binding 
to the complex (Figure 7A). This combinatorial recognition 
suggests a simple model for the discrimination of folded from 
nonfolded proteins. TRiC will recognize those conformations in- 
teracting with more than one subunit but will not interact with 
proteins where most binding motifs are no longer available. 

The Evolution of Hetero-oligomeric Chaperonins 

It is intriguing to consider what drove the evolution of such a 
complex hetero-oligomeric folding machine. TRiC substrates 
tend to encode complex topologies, and many coevolved with 
TRiC to the point of being unable to fold in its absence, (e.g., 
actin). Archaea have simpler chaperonins, ranging from one to 
five subunits depending on the organism (Bigotti and Clarke, 
2008). We considered whether changes in the proteome are 
linked to subunit diversification. Strikingly, comparing all or- 
ganisms containing TRiC-like chaperonins, we find a positive 
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Figure 7. The Polypeptide-Binding Site of TRiC/CCT: Functional and Evolutionary Implications 

(A) Role of subunit-specific substrate recognition in the context of the TRiC folding cycle. See Discussion for description. Subunits with high ATP affinity are in 
dark gray, with low ATP affinity in light gray. 

(B) Evolutionary divergence across H11/PL substrate-binding sites across S. cerevisiae paralogs (left) and conservation of binding site across orthologs (CCT5, 
right panel). 

(C) Substrate-binding-site properties of the different yeast TRiC subunits schematically shown in box format as in Figure 2H. Upper line corresponds to H11/ 
apical hinge residues, and lower line corresponds to PL. The substrate-binding site of archeal chaperonin from M. maripaludis is included for comparison. 

(D) The substrate-binding-site interface of the open TRiC conformation. TRiC is shown in semi-transparent gray cartoon, and substrate-binding sites in surface 
representation colored according to chemical properties, as indicated. 

(E) Group II chaperonin subunit complexity increases with proteome size. Number of subunits in archeal (purple) and eukaryotic (green) organisms graphed 
versus proteome size, as number of open reading frames (ORFs) in that organism. 



correlation between subunit diversity and the size of its prote- 
ome (Figure 7E). The possible link between subunit diversifica- 
tion of TRiC and expansion of the proteome in eukaryotes raises 
questions on the mechanisms linking protein evolution to 
changes in chaperone structure and composition. Subunit diver- 
sification may increase the probability that a given protein will 
present two or more binding sites that can combinatorially bind 
to the chaperonin and benefit from the mechanistic advantages 
of folding in its chamber. Thus, the complexity of the chaperonin 
appears functionally optimized for the complexity of the genome, 
suggesting that the folding machinery contributes to dictate pro- 



teome size. A better understanding of the substrates and recog- 
nition motifs for the different TRiC and archaeal subunits may 
provide exciting insights into protein evolution. 

EXPERIMENTAL PROCEDURES 

Apical domains of TRiC were expressed and purified as described previously 
(Spiess et al., 2006). A set of alanine mutants were cloned and purified based 
on the chemical-shift pertubation experiments, and binding kinetics were 
analyzed by SPR. All apical domains were soluble and folded, as assessed 
by CD. For the SPR experiments, peptides were immobilized using maleimide 
chemistry on a PEG-derivatized surface, and a dilution series of apical protein 
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was flowed over as analyte. For NMR, ApiCCTS and p6 

samples were expressed and purified using standard isotope-labeling proce- 
dures (see Extended Experimental Procedures for details). NMR chemical- 
shift mapping and backbone assignment experiments for ApiCCTS and p6 
were carried out using 300 laM protein sample on a 800 MHz Inova Varian 
spectrometer outfitted with a cryogenic probe. Biotinylated p6wr or p6mut 
peptides were incubated with extracts from human HEK293 cells and affinity 
isolated via the biotin tag (IP). The presence of TRiC in the IP is visualized by 
western blot analysis against TRiC antibodies raised against CCT5 and 
CCT2. DSS-crosslinked TRiC-substrate samples were treated with trypsin, 
enriched for crosslinked peptides by size-exclusion chromatography and 
analyzed by tandem mass spectrometry. Crosslinked peptides were identified 
by xQuest (Rinner et al., 2008). Conservation scores were calculated using 
Rate4site (Pupko et al., 2002) and mapped onto the models using ConSurf 
(Ashkenazy et al., 201 0). Generation of structural models was guided by exper- 
imentally determined backbone chemical-shift parameters in CS-Rosetta 
(Lange et al., 2012; Mao et al., 2014; Shen et al., 2009; van der Schot et al., 
2013). RosettaDock was used to model the ApiCCT-substrate complexes 
with CS-NMR-based site constraints or with XL-MS-based atom pair con- 
straints. Homology model building for the remaining ApiCCT domains was 
carried out using the Rosetta software package (Chivian and Baker, 2006; Da- 
vis and Baker, 2009). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and four tables and can be found with this article online at http://dx.doi. 
org/10.1016/j.cell.2014.10.042. 
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SUMMARY 

Cdc42 is a highly conserved master regulator of 
cell polarity. Here, we investigated the mechanism 
by which yeast cells never re-establish polarity at 
cortical sites (cytokinesis remnants [CRMs]) that 
have previously supported Cdc42-mediated growth 
as a paradigm to mechanistically understand how 
Cdc42-inhibitory polarity cues are established. We 
revealed a two-step mechanism of loading the 
Cdc42 antagonist Nba1 into CRMs to mark these 
compartments as refractory for a second round of 
Cdc42 activation. Our data indicate that Nba1 
together with a cortically tethered adaptor protein 
confers memory of previous polarization events to 
translate this spatial legacy into a biochemical signal 
that ensures the local singularity of Cdc42 activation. 
“Memory loss” mutants that repeatedly use the 
same polarity site over multiple generations display 
nuclear segregation defects and a shorter lifespan. 
Our work thus established CRMs as negative polarity 
cues that prevent Cdc42 reactivation to sustain the 
fitness of replicating cells. 

INTRODUCTION 

The establishment of cell polarity sites is fundamental for a 
plethora of cellular functions related to morphogenesis, differen- 
tiation, and/or proliferation of uni- and multicellular organisms 
(Bloch and Yalovsky, 2013; Dworkin, 2009; Heasman and Ridley, 
2008; Howell and Lew, 2012; Iden and Collard, 2008; Li and Bo- 
werman, 2010; Martin-Belmonte and Perez-Moreno, 2011; 
McCaffrey and Macara, 2009; Nelson, 2009; Noatynska et al., 
2013). The Rho-GTPase Cdc42, initially described in yeast 
(Johnson and Pringle, 1990), is a master regulator of cell polari- 
zation and highly conserved among eukaryotes (Boureux et al., 
2007; Etienne-Manneville, 2004). How Cdc42 polarity sites are 
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regulated in space and time has been extensively studied over 
the past years (Casamayor and Snyder, 2002; Johnson et al., 
2011; Park and Bi, 2007). However, much less is known about 
the molecular mechanisms involved in the establishment of po- 
larity cue refractory to Cdc42 activation. 

During G1 phase, yeast cells establish a polarity site from 
which the daughter cell will emerge. Polarity establishment de- 
pends on Rho GTPase Cdc42 (Park and Bi, 2007). The site of 
bud emergence (bud neck) will later be used in cytokinesis to 
separate the daughter from mother cell (Figure 1A) (Park and 
Bi, 2007; Wloka and Bi, 2012). The remnants of the cytokinetic 
machinery (bud scar or cytokinesis remnant [CRM]) are retained 
in the mother cell and can be visualized by transmission electron 
microscopy (TEM) or by specific dyes (Figure IB) (Meitinger 
et al., 2013). CRMs mainly consist of extracellular matrix, which 
is encircled by a chitin-rich ring (Cabib et al., 1993). In addition, 
transmembrane proteins, including Raxi and Rax2, protrude 
into the extra- and intracellular spaces (Kang et al., 2004). This 
indicates that CRMs may modulate intracellular processes. 
However, relatively little is known about the composition and 
function of CRMs. During the replicative lifetime, the aging 
mother cell accumulates increasing numbers of these CRMs 
over its cell surface (Casamayor and Snyder, 2002). Importantly, 
a new polarity site is never established within CRMs, indicating 
that Cdc42 activation cannot occur twice at the same site, 
even though the initiation event could be many generations after 
the cytokinesis that generated the remnant. 

The molecular basis for the prevention of Cdc42 activation at 
CRMs remains to be established. Different GTPase-activating 
proteins (GAPs) contribute to the inhibition of Cdc42 at the cell-di- 
vision site (Atkins et al., 2013; Tong et al., 2007). In addition, the 
scaffold protein Gpsi inhibits Cdc42 at the site of cytokinesis 
as part of a pathway working in parallel to the Cdc42 GAP Rgal 
(Meitinger et al., 2013). However, neither Cdc42 GAPs nor Gpsi 
accumulate at old CRMs. Using a combination of proteomics, 
cell biology, and biochemical approaches, we identified a protein 
complex that is recruited to the cell-division site by Gpsi and in- 
herited to CRMs in a Gpsi and Rax1-Rax2-dependent manner. 
We show that Nbal , a bud-neck-associated protein of unknown 
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function (Calvert et al., 2008), is the core component of this 
complex that is responsible for the prevention of Cdc42 activation 
in the remnant. Mutant cells that are unable to inhibit Cdc42 acti- 
vation at CRMs display nuclear segregation defects and have 
markedly shorter lifespans. This study therefore identifies the 
molecular mechanism for a long-recognized phenomenon that 
keeps CRMs inactive for Cdc42-reactivation, thereby ensuring 
the longevity of asymmetrically dividing yeast cells. 

RESULTS 

Napi, Nbal, and Nisi Function with Gpsi in Cdc42 
Inhibition 

Nap1, Nba1, and Nisi were previously identified as putative 
Gpsi -interacting proteins (Meitinger et al., 2013). Napi is a 
conserved histone chaperone involved in chromatin assembly 
and morphogenesis (Ishimi and Kikuchi, 1991; Mortensen 
et al., 2002), whereas Nbal and Nisi are proteins of unknown 
function associated with the cell-division site (Calvert et al., 
2008; Iwase and Toh-e, 2001). Using coimmunoprecipitation ex- 
periments, we established that Napi , Nbal , and Nisi interact 
with Gpsi (Figure 1C). Fluorescence microscopy analysis re- 
vealed that GFP-tagged Gpsi , Napi , Nbal , and Nisi localize 
to the cell-division site (Figure 1 D). This localization was cell-cy- 
cle-dependent for Napi , Nbal , and Nisi (Figures SI A and SI B 
available online). In contrast, only Nbal and Nisi localized to 
CRMs (Figure IE). 

Gpsi localized to the cell-division site independently of NAP1 , 
NBA1, or MS 7 (Figures IF and SIC). Similarly, Napi localization 
to the cell-division site was not affected in gps7 A, nba 7 A, ornis 1 A 
cells (Figures 1 F and SI C). Nbal failed to localize to the cell-divi- 
sion site in the absence of GPS1 (Figures 1 F and SI C), although 
its protein levels were unaffected by deletion of GPS1 (Fig- 
ure SID). Nisi localization to the cell-division site also required 
GPS1 . Moreover, A/AP7 and A/B/A7 were necessary for Nisi local- 
ization to the cell-division site (Figures IF and SIC). Consistent 
with this observation, coimmunoprecipitation of Nisi with Gpsi 
was impaired in the absence oi NAP 1 or NBA1 (Figure 1C). 

Interestingly, although GFP-tagged Gpsi and Napi did not 
localize to CRMs, both were necessary for Nbal and Nisi local- 
ization to CRMs (Figures 1G and SIC). Impaired localization of 
Nisi to CRMs in nap 7 A cells was at least partially explained by 
reduced levels of Nisi -GFP in this mutant (Figure SI D). Further- 
more, localization of Nbal and Nisi to CRMs was mutually 
dependent (Figure 1G). Together, we conclude that Gpsi and 
Napi are involved in recruiting Nbal and Nisi to the cell-division 
site, and in loading Nbal and Nisi to CRMs (Figures 1H). 

Gpsi coordinates the activity of Cdc42 and Rhol , two mem- 
bers of the Rho family of small GTPases (Meitinger et al., 
2013). In the absence of GPS1, abnormal activation of the 
Cdc42 GTPase causes rebudding inside the old cell-division 
site, resulting in the appearance of one or more concentric col- 
lars of cell wall material at the cell-division site in 30%-40% of 
the cells (Figure II) (Meitinger et al., 2013). In addition, deletion 
of GPS1 also compromises the Rhol cytokinetic pathway, lead- 
ing to a thinning of the secondary septum (Meitinger et al., 201 3). 
Transmission electron microscopy of nap7A, r?iba7A, and n/s7A 
cells revealed multiple collars at the bud neck without any thin- 



ning of the secondary septum in all three mutants (Table 1A), 
suggesting that Napi , Nbal , and Nisi are involved in regulating 
Cdc42 but not Rhol . The multiple collar phenotype was more 
frequent in the nba1 A strain (Table 1A), indicating that Nbal 
plays a major role in Cdc42 inhibition. Together, these results es- 
tablished Napi, Nbal, and Nisi as Gpsi -interacting proteins 
that are specifically involved in inhibiting the Cdc42 pathway. 
Furthermore, our findings suggest that inhibition of Cdc42 might 
not be restricted to the cell-division site but could also occur at 
CRMs, where Nbal and Nisi localize. 

Nbal Is Necessary and Sufficient to Inhibit Cdc42 

We sought to understand how Gpsi interacts with Cdc42, Napi , 
Nbal, and Nisi. Yeast two-hybrid experiments with Gpsi 
truncations suggested that Gpsi has specific binding domains 
for each of its interaction partners (Figures 2A and S2A). 
In vitro, recombinant Gpsi 293-422 fragment of Gpsi containing 
the amino acid residues 293-422), Gps1^^"^"®°° and Gps1^^^“^^° 
interacted directly and specifically with recombinant Napi, 
Nbal, and Nisi, respectively (Figure 2B). Based on these re- 
sults we generated a Gps1^^^^“^^® mutant in which the Nbal 
interaction domain was deleted. Nbal failed to interact with 
Gpsi ^513-598 yeast two hybrid (Figure 2C) and showed 
impaired localization to the cell-division site and CRMs in 
cells (Figures 2D and 2E). Notably, 
cells showed no defects in Rhol regulation, as judged by the 
thickness of the secondary septum, yet exhibited multiple collar 
phenotype with frequency similar to that of gps1A and nba1A 
mutants (Table IB). This observation suggests that Nbal func- 
tions downstream of Gpsi in the Cdc42 inhibition pathway, 
and that the main role of Gpsi is to recruit Nbal to the cell divi- 
sion site and CRMs. 

If this model is correct, then artificial targeting of Nbal to the 
cell-division site should eliminate the need for Gpsi in Cdc42 in- 
hibition. We tested this possibility using the septin Shsi tagged 
with the GFP-binding protein (GBP) (Rothbauer et al., 2008) to re- 
cruit GFP fusions to the cell-division site (Meitinger et al., 2013). 
Expression of SHS1-GBP in NBA1-GFP gps7A cells resulted in 
permanent association of Nbal -GFP with the cell-division site, 
partially restored the localization of Nbal -GFP to CRMs and 
partially rescued the growth defect of the gps1A mutant (Figures 
2F-2H). Notably, artificial tethering of Nbal to Shsi rescued the 
multiple collar phenotype but not the secondary septum defect 
of gps1A cells (Table 1C), indicating that NBA1-GFP gps7A 
SHS1-GBP cells bypassed the requirement of Gpsi in the regu- 
lation of Cdc42 but not of Rhol . Similar results were obtained 
with MS 7 -GFP gps 7 zl SHS 7 -GBP cel Is (Figures S2B-S2D; Table 
ID). However, Shsi -tethered Nisi required NBA1 for function, 
whereas Shsi -tethered Nbal suppressed the gps7A growth 
defect even in the absence of MS 7 (Figures 2H and S2D). Collec- 
tively, our data indicate that Gpsi recruits Nbal and Nisi to the 
cell-division site and CRMs, where Nbal is necessary and suffi- 
cient to inhibit Cdc42 activation. 

Nbal Inhibits Binding of the GEF Cdc24 to GTP-Bound 
Rsri 

Next, we concentrated on the mechanism of Cdc42 inhibition by 
Nbal. Two parallel mechanisms contribute to the site-directed 
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Table 1. Cdc42 Phenotype and Rho1 Phenotype of Different Mutants 



Septum Thickness (nm) 



No. 


Genotype 


Cells with Indicated Number of Collars (%) 




Mean (SD) 


p Value 


n 


A 




0 


1 


2 


>2 








1 


Wild-type 


100 


0 


0 


0 


274 (98) 


- 


108 


2 


gps7A 


64 


28 


7 


1 


128 (51) 


<0.001 


77 


3 


nbali^ 


61 


31 


8 


0 


231 (85) 


0.006 


51 


4 


n/s7A 


95 


5 


0 


0 


266 (98) 


0.6 


44 


5 


napi ^ 


97 


3 


0 


0 


361 (107) 


<0.001 


38 


6 


rax1\ 


96 


4 


0 


0 


258 (108) 


0.4 


46 


7 


rax2L 


94 


6 


0 


0 


297 (107) 


0.2 


48 


8 


rgalb. 


44 


40 


14 


2 


317(113) 


0.01 


43 


9 


rgalis, nball^ 


34 


38 


22 


6 


272 (95) 


0.9 


49 


10 


rsri L 


100 


0 


0 


0 


ND 


ND 


33 


11 


nba 1 A rsri A 


100 


0 


0 


0 


ND 


ND 


37 


12 


rgalL rsri L 


81 


19 


0 


0 


ND 


ND 


63 


13 


nbal rgalL rsri L 


72 


22 
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2 


ND 


ND 


54 


B 1 


gpsi GPSI 


100 
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318(140) 


- 


27 
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gps7A 


65 


30 
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0 
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<0.001 


37 
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gpsi ^ gpsi -t^301 -422 


98 


3 


0 


0 


274 (98) 


0.2 


64 


4 


gpsi gpsi -1^443-530 


100 


0 


0 


0 


332 (116) 


0.6 


38 


5 


gpsi ^ gpsi -^51 3-598 


62 


31 


7 


0 


322 (113) 


0.8 


42 


C 1 


NBA1-GFP 


100 
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0 


231 (76) 
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49 
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NBA1 -GPP gpsi L 


60 


32 
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4 


94 (44) 


<0.001 


50 
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34 


3 


NISI -GFP gps7 A SHS1 -GBP 


98 
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42 


ND, not determined. 



activation of Cdc42, the formation of a trimeric complex 
composed of the Cdc42 guanine nucleotide exchange factor 
(GEF) Cdc24, the scaffold protein Bem1 and the p21 -activated 
(PAK) kinase Cla4, which can be reconstituted in vitro using 
recombinant proteins (Figures S3A and S3B) (Bose et al., 2001 ; 
Kozubowski et al., 2008), and the recruitment of Cdc24 by 
the Ras-related GTPase Rsri (Park et al., 1997; Shimada 
et al., 2004; Smith et al., 2013). We sought to determine 
whether Nbal affects activation of Cdc42 by these two 
mechanisms. 



In yeast two-hybrid experiments, Nbal interacted with Bemi 
and Cdc24 (Figure 3A). In vitro binding experiments suggested 
that these interactions are direct, because recombinant Nbal 
associated with recombinant Cdc24 and Bemi in vitro (Fig- 
ure 3B). Moreover, Nbal colocalized with Cdc24 and Bemi at 
the cell-division site and with Bemi at the emerging CRMs (Fig- 
ures 3C-3F). In contrast. Nisi did not associate with Cdc24 or 
Bemi in the yeast two-hybrid system (Figure 3A) or in vitro (Fig- 
ure S3C), suggesting that Nisi does not play a direct role in regu- 
lating Cdc24. In vitro competition assays revealed that Cdc24 



Figure 1. Nba1, Nisi, and Napi Function with Gpsi in the Cdc42 Pathway 

(A) The illustration depicts active Cdc42 (green), which leads to the growth of the daughter cell. The actomyosin ring (red) forms at the same site where Cdc42 
initially established polarity. After cytokinesis, this site is marked by cytokinesis remnants (blue, bud scars). 

(B) Electron micrograph of a dividing yeast cell. The enlarged box shows the cell-division site and two cytokinesis remnants. Scale bars, 1 .0 lam. 

(C) Immunoprecipitation using cell lysates of the indicated genotypes. TAP, tandem and affinity purification tag. 

(D and E) Localization of Gps1 , Nap1 , Nba1 , and Nisi at the cell-division site (D) and at cytokinesis remnants (CRMs) (E). CRMs were stained with calcofluor. Scale 
bar, 5 ^irn. 

(F) Quantification of (D). Only large-budded cells were counted, n > 100 cells per strain. *p < 0.0001 . 

(G) Quantification of (E). Only cells with CRMs were counted, n > 1 00 cells per strain. *p < 0.0001 . See Figure SI C for additional p values of (F) and (G). Data in (F) 
and (G) are represented as mean ± SEM of three independent experiments. 

(H) Diagrams summarizing the dependence in protein localization at the bud neck or CRMs as scored in (F) and (G). 

(I) Electron micrographs showing cross-sections of the bud-neck region (cell-division site) of wild-type (normal bud neck) and nba1A (collar at bud neck) cells. The 
arrowhead points toward a bud-neck collar, which is indicative of Cdc42 activation at the same site of cytokinesis, thereby causing budding inside the old cell- 
division site. Scale bar, 1 ^im. See also Figure SI. 
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Figure 2. Nba1 Is Required and Sufficient to Inhibit Cdc42 Activation 

(A) Schematic representation of the Gps1 fragments that specifically interact with Rho1/Cdc42, Nap1 , Nisi , and Nba1 (see also Figure S2A). Numbers indicate 
amino acid positions. 

(B) In vitro binding assays showing the binding of GST or the indicated GST-Gps1 fragments to beads coupled to MBP-Nap1, MBP-Nba1, or MBP-Nis1. 
Ponceau-S-stained nitrocellulose membranes show the inputs for MBP (top panels) and GST (middle panels) fusion proteins. 

(C) Yeast two-hybrid analysis of Gps1 full-length and truncated forms lacking the indicated amino acids. Blue color denotes interaction. 

(D) Quantification of Nba1 localization at the cell-division site in the indicated Gps1 truncated mutants. Only large-budded cells were counted (n > 1 00 per strain). 
Data are represented as mean ± SEM of three independent experiments. 

(E) Localization of Nba1 at CRMs. In all inspected cells (n > 100 per strain), Nba1 localized at CRMs in GPS1 but not in gps1-A51 3-598 cells. CRMs were stained 
with calcofluor. 

(F) Artificial tethering of Nba1-GFP in wild-type andgps7zl cells at the cell-division site using the GFP binding protein (GBP) fused to the septin Shs1 . Kymographs 
of the daughter-mother axis are shown. Each pixel represents one time point. Myo1 -3Cherry, which disappears from the cell-division site after cytokinesis, was 
used as a cell-cycle marker. 

(G) Localization of Nba1-GFP at CRMs in GPS1 , gps1 A SHS1 , and gps1 A SHS1-GBP cells carrying Myo1 -3Cherry. CRMs were stained with calcofluor. 

(H) Growth test showing serial dilutions of the strains with the indicated genotypes and carrying an URA3-based plasmid expressing wild-type GPS1 . Cells were 
spotted on SC-complete plates (maintain URA3-GPS1) and on 5FOA plates (selects against URA3-GPS1). Scale bars, 5 i^m. See also Figure S2. 



bound to Nbal only when Beml was either absent or present in 
substochiometric amounts (Figure 3G, lanes 3-6). Similarly, 
Beml associated with Nbal only in the presence of low levels 
of Cdc24 (Figure 3G, lanes 9-12). However, Nbal was not able 
to disassemble or to prevent the assembly of Cdc24-Bem1- 
Cla4 complexes in vitro (Figures S3D and S3E). Therefore, we 
reasoned that Nbal binds to Cdc24 and Beml to prevent a 
different activation step. 

We next tested whether Nbal affects the binding of Cdc24 to 
Rsri . As previously reported, Cdc24 preferentially bound recom- 



binant Rsri that was locked in the GTP-bound state (Figure 3H, 
lanes 1 and 3) (Park et al., 1997). Strikingly, Cdc24 did not asso- 
ciate with GTP-bound Rsri in the presence of Nbal (Figure 3H, 
lane 2), suggesting that Nbal could inhibit the activation of 
Cdc24 by Rsri . We tested this possibility in vivo using two mu- 
tants of Cdc24: Cdc24^'^®, which cannot bind to Beml (Kozu- 
bowski et al., 2008), and Cdc24^^®®°, which is unable to bind 
Rsri (Shimada et al., 2004). Deletion of NBA1 rescued the 
growth defect of a cdc24A strain expressing the cdc24^^^ 
mutant but not the cdc24^^^^^ mutant (Figures 3I and S3F), 
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suggesting that activation of Cdc24^^® by Rsr1 is not sufficient 
to support growth due to negative regulation by Nba1 . In conclu- 
sion, our data support a model in which Nba1 locally prevents 
activation of the guanine nucleotide exchange factor Cdc24 by 
Rsr1 , thereby inhibiting Cdc42 activity at the cell-division site 
and CRMs (Figure 3J). 

The Transmembrane Proteins Raxi and Rax2 Anchor 
Nbal and Nisi at CRMs 

Puzzlingly, Nba1 and Nisi directly interact with Gpsi and require 
Gpsi for their localization to both the cell-division site and CRMs, 
but Gpsi is only associated with the cell-division site. Nbal and 
Nisi have no obvious transmembrane or membrane-binding do- 
mains, suggesting they likely rely on a protein other than Gpsi as 
an anchor at CRMs. Raxi and Rax2, two transmembrane pro- 
teins of unknown function in haploid cells, stably localize to 
CRMs in an interdependent manner (Chen et al., 2000) (Figures 
S4A and S4B). We found that RAX1 and RAX2 were required 
for the localization of Nbal and Nisi to CRMs but not to the 
cell-division site, without affecting the total protein levels of 
Nbal and Nisi (Figure 4). Deletion of RAX1 or RAX2 impaired 
Cdc42 inhibition to the same extent as deletion of MS 7 , as judged 
by the percentage of cells with multiple collar phenotype (Table 
1A). Yeast two-hybrid analysis revealed that Nisi interacted 
with the C-terminal cytoplasmic tail of Rax2 in a A/AP7 -depen- 
dent manner (Figures S4C-S4E). Deletion of the 37 C-terminal 
residues of Rax2 (Rax2-AC) impaired Nisi and Nbal localization 
to CRMs (Figures S4F and S4G). However, Raxi also failed to 
localize to CRMs in rax2-LC cells (Figure S4H), suggesting that 
the C-terminal tail of Rax2 is critical for Rax1-Rax2 complex for- 
mation. The recruitment of Gpsi or Napi to the cell-division site 
was not influenced by RAXI or RAX2 deletions (data not shown). 
Together, these results place Raxi and Rax2 in the Cdc42 inhibi- 
tion pathway and indicate that Raxi and Rax2 serve as a binding 
platform for Nbal and Nisi at CRMs. 

Nbal and Nisi Are Transferred from the Cell-Division 
Site to the Emerging CRM 

We next investigated the mechanism by which Nbal and Nisi 
localize to CRMs. We considered two possibilities: Nbal and 
Nisi could be loaded onto CRMs from a cytoplasmic pool 
(“diffusion model”) or Nbal and Nisi could be transferred, 
directly or indirectly, from the site of cell division to Rax1-Rax2 
at CRMs (“inheritance model”). Two observations suggest that 
Nbal and Nisi must associate with Gpsi and Napi at the cell- 
division site before it becomes a CRM with cell separation, and 
thus favor the inheritance model. First, Gpsi and Napi are 
restricted to the cell-division site, yet both proteins are required 
for Nisi and Nbal localization to CRMs (Figures 1D-1H). Sec- 
ond, artificial tethering of Nbal and Nisi to the cell-division 
site partially restored Nbal and Nisi association with CRMs in 
gps1A cells (Figure 2G and S2C). To directly examine Nbal 
and Nisi loading to CRMs, we performed a pulse-chase exper- 
iment using cells expressing NIS1-GFP under control of the 
galactose-inducible promoter {pGal1) (Figure 5A). Without pro- 
moter induction, no Nisl-GFP could be detected in pGal1- 
NIS1-GFP cells (Figure 5A, t = -2 hr). After a 2 hr pulse of induc- 
tion, Nisi -GFP was detected initially only at the cell-division site 



and later at the newly emerging (proximal) but not at the older 
(distal from the bud neck) CRMs (Figure 5A, t = 0 and 2 hr). 
Similar results were obtained for Nisl-GFP in pulse-chase ex- 
periments with inducible expression of NBA1 or RAX1 (Fig- 
ure S5A). Moreover, Nbal -GFP behaved in an analogous 
manner in similar experiments (Figures 5A and S5A). These re- 
sults indicate that Nbal and Nisi must associate with the cell-di- 
vision site before it is converted into a CRM. 

Interestingly, Nisi -GFP and Nbal -GFP were found exclusively 
at distal CRMs 8 hr after the expression pulse (Figure 5A), sug- 
gesting that both proteins associate with CRMs in a stable 
manner. Supporting this notion, analysis of protein age with a 
tandem fluorescent protein timer (tFT) composed of the fast- 
maturing green fluorescent protein sfGFP and the slower- 
maturing red fluorescent protein mCherry (Khmelinskii et al., 
2012) showed that Nbal and Nisi pools are on average older at 
CRMs than at the cell-division site (Figure 5B), similar to Rax2, 
which is known to be stably associated with CRMs (Chen et al., 
2000; Khmelinskii et al., 2012). Moreover, in fluorescence recov- 
ery after photobleaching experiments, Nisl-GFP showed no 
detectable turnover at CRMs over a period of more than 4 hr, 
whereas Nbal -GFP underwent slow turnover at CRMs (half- 
time >30 min) (Figures 5C, 5D, and S5B; data not shown). 

Collectively, our results are consistent with a model in which 
Nisi and Nbal need to associate with the cell-division site to 
become stably recruited to CRMs, where Nisi recruits Nbal 
through a series of successive steps that required Gpsi , Napi , 
and Rax1-Rax2 (Figures 1H, 4, and 5E). 

Cell-Division Site Reuse Impairs Nuclear Segregation 
and Shortens Replicative Lifespan 

Deletion of the gene coding for Rgal , a GTPase-activating pro- 
tein (GAP) for Cdc42, was previously reported to result in rebud- 
ding at the most recent site of cell division (Tong et al., 2007). We 
therefore tested whether Rgal and Nbal cooperate in prevent- 
ing the reuse of previous cell-division sites. Transmission elec- 
tron microscopy showed that the percentage of cells with 
multiple collars at the bud neck was higher in the rga7A nba1/^ 
double mutant than in either single mutant (Table 1A). We 
conclude that Rgal and Nbal cooperate in inhibiting rebudding 
at CRMs. However, whereas Rgal prevents Cdc42 reactivation 
at the most recent CRM, Nbal also prevents Cdc42 reactivation 
at older CRMs with the help of Gpsi, Napi, Nisi, Raxi, and 
Rax2. 

Next, we sought to determine the physiological significance of 
preventing rebudding from CRMs. We followed single cells 
through consecutive divisions until cell death using a microflui- 
dics-based long-term live cell imaging technique (Fehrmann 
et al., 2013) with a GFP-tagged histone H2B (Htb2) to visualize 
nuclear segregation (Figure 6A). Whereas wild-type cells divided 
on average 20 times before death (replicative lifespan) in this 
assay, the replicative lifespan of cells lacking NBA1 and RGA1 
alone or in combination was significantly shorter (Figures 6A 
and 6B). The short replicative lifespan of r?iba7A and rga7A cells 
was clearly distinct from the frequent premature cell death 
observed in gps7A cells (Meitinger et al., 2013). The thinning of 
the secondary septum observed specifically in gps7A cells 
(Table 1 A) leads to the lysis of daughter cells during cytokinesis 
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Figure 3. Nba1 Directly Interacts with Cdc24 and Bem1 and Inhibits the Binding of Cdc24 to GTP-Bound Rsri 

(A) Yeast two-hybrid interaction of Nba1, Nisi, and Nap1 with Cdc24, Bem1, Cdc42, and Gps1. The blue color indicates interaction. 

(B) In vitro binding assay using recombinant proteins. Immobilized 6His-Nba1 was incubated with MBP alone, MBP-Cdc24, or MBP-Bem1 as indicated. The 14- 
3-3 family protein Bmh1 was used as a negative control. 

(C and D) Kymographs of the daughter-mother cell axis show the colocalization of Cdc24 and Nba1 (C) or Bem1 and Nba1 (D) at the cell-division site. Each pixel 
represents onetime point. Nba1 appears at the cell-division site before Cdc24 (7.1 ± 2.0 min, n = 15) or Bem1 (8.1 ± 1.7 min, n = 17). Arrowhead, appearance of 
Nba1 at the cell-division site; *, appearance of Cdc24 and Bem1 at the cell-division site; **, cell separation. 

(E and F) Time-lapse series showing top view of Nba1-GFP (E) or Bem1-GFP (F) at the cell-division site during actomyosin ring contraction (Myo1-3Cherry). 

(legend continued on next page) 
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Figure 4. The Transmembrane Proteins Rax1 and Rax2 Target Nba1 and Nisi to Cytokinesis Remnants 

(A and B) Localization of Nba1 -GFP and Nisi -GFP at the cell-division site (A) or CRMs (B) in wild-type, rax1 A , and rax2A cells (n > 1 00 for each strain). Arrows point 
to CRMs. CRMs were stained with calcofluor. Scale bar, 5 pm. 

(C) Immunoblots showing the protein amounts of Nbal-GFP and Nisi -GFP in wild-type, rax1A, and rax2A cells using anti-GFP antibodies. An unspecific band 
served as a loading control. See also Figure S4. 



(Meitinger et al., 2013). As a result, an asynchronous gps7A (but 
not nba1/^, rga1H^, or nba 7 A rga 7 A) population contained ~30% 
of dead cells and exhibited reduced doubling time (Figures S6A 
and S6B). 

We thus examined the premature cell death phenotype of the 
r?iba7A rga7A mutant in more detail. We determined the time 
that it took to segregate the nucleus into the daughter cell 
body in each cell division and complete anaphase throughout 
the course of replicative aging. Wild-type cells completed 
nuclear segregation in less than 20 min, with some sporadic 
delays (Figure 6C). Nuclear segregation delays became more 
frequent in wild-type cells after entry into senescence, which 
was scored as a sudden increase in the duration of cell cycle 
(Fehrmann et al., 2013) (Figures 6C and 6D). In contrast, delayed 
nuclear segregation was frequent even in young (i.e., nonsenes- 
cent) nba1 A rga1 A cells and often immediately preceded cell 
death (Figures 6A, 6C, and 6D), suggesting that these cells 
died prematurely as a consequence of a defective mitosis. 
Consistently, nuclear segregation into the daughter compart- 
ment frequently failed inr?iba7zl rga1A cells, resulting in binucle- 
ated cells (Figures 6C and 6D). 

How could reuse of previous cell-division sites cause defects 
in nuclear segregation? We observed that the diameter of the 
bud-neck opening was narrower in cells with multiple collars, 
which underwent rebudding from CRMs (Figures 6E and 6F). 
Moreover, the replicative lifespan of the r?iba7A, rga7A, and 
r?iba7A rga7A cells negatively correlated with the frequency of 
rebudding from CRMs, as judged by the percentage of cells 



with bud-neck collars (Figure 6G). We reasoned that narrowing 
of the bud neck due to repeated rebudding from the same site 
could hinder nuclear segregation into the daughter cell, thus ex- 
plaining the nuclear segregation defects and shortened lifespan 
of r?iba7A rga7A cells. Supporting this hypothesis, deletion of 
RSR1 , which results in Cdc24 mislocalization and random 
budding (Chant and Herskowitz, 1991), reduced the percentage 
of nba1A rga1A cells with bud-neck collars (Table 1A) and 
significantly rescued the survival rate of nba1A rga1A cells 
(Figure 6H). 

Collectively, our data strongly suggest that the relocation of 
the site of bud growth makes a significant contribution to the 
fidelity of nuclear segregation and is required to ensure a robust 
replicative lifespan. 

DISCUSSION 

Although it is long recognized that Cdc42 activation is never es- 
tablished twice at the same site in yeast, the underlying molecular 
mechanisms and physiological significance of such a regulation 
remain elusive. Here, we show that CRMs of budding yeast are 
marked by Cdc42 inhibitory complexes. We identified the protein 
Nbal as a core subunit of these complexes and show that 
Nbal directly regulates Cdc42. In vitro analyses combined with 
in vivo functional studies establish that Nbal spatially inhibits 
the activation of Cdc42 by impeding the interaction between 
the Cdc24 GEF, Cdc24, and its upstream activator, the GTPase 
Rsri. This study therefore reveals Nbal as a player that 



(G) In vitro competition assay using bacteriai purified proteins. Left panei: immobiiized Nbal was incubated with buffer oniy (iane 1), myc3-Bem1 (iane 2), and 
myc3-Cdc24 in the absence (iane 3) or presence of increasing amounts of Bemi (ianes 4-6). Right panei: same as before but using a constant amount of myc3- 
Beml and variabie amounts of myc3-Cdc24. Asterisk, unspecific protein. 

(H) Nba1 interrupts the interaction of GTP-Rsri withCdc24. immobiiized GST-Rsri was incubated with GTPyS (ianes 1 and 2), GDP (iane 3), or buffer oniy (iane 4). 
GST-Rsri beads were incubated with myc3-Cdc24 protein extract in the absence (iane 1) or presence (iane 2) of 6His-Nba1 . GST oniy was used as a negative 
controi (iane 4). Myc3-Cdc24 was detected using anti-myc antibodies. 

(i) Growth of cdc24A NBA1 URA3-CDC24 or cdc24A nba1A URA3-CDC24 ceiis carrying a LEL/2-based piasmid containing fuii-iength CDC24, cdc24^'^^ {CDC24 
codons 1 -750), or CDC24^''^^^ (giycine 1 68 substituted to giutamic acid) as indicated. Seriai diiution of ceiis were spotted on SC-LEU {URA3-CDC24 is retained) 
and 5FOA-LEU2 {URA3-CDC24 is iost) piates and incubated at 23°C or 37°C for 3 days (see Figure S4E for growth at different temperatures). 

(J) Summary of Nbal interactions. Our data indicate that Nbal binds directiy to Bemi and Cdc24 and inhibits Cdc24 binding to Rsri at CRMs. See aiso Figure S3. 



Cell 159, 1 056-1 069, November 20, 201 4 ©201 4 Elsevier Inc. 1 063 






Cell 



A 



galactose (induction) 

I glucose (repression) 



I 



-2 0 2 4 6 8 
time [h] 




-2 0 2 4 6 8 



time [h] 




§ -2 0 2 4 6 8 



time [h] 



GFP 

unsp. 




time [h] 




§ -2 0 2 4 6 8 
time [h] 



GFP 

unsp. 



GFP signal 

at cell division site 
at CRMs 




100 




-2 0 2 4 6 8 
time [h] 




0 ' ' ' ' — 

-2 0 2 4 6 8 



time [h] 



100 




80 




60 




40 


-// \ 


20 


/J ' 


0* 




2 0 2 4 6 8 



time [h] 



proximal 



no GFP signal 






distal 



phase GFP CRMs merge 




CRM occupancy 

■ all ■ proximal 

■ none ■ distal 




-2 2 8 
time [h] 




-2 2 8 
time [h] 




-2 2 8 
time [h] 




time [h] 



B 



sfGFPmCherry ratio 




27 25 25 




c 



CRMs Nisi -GFP merge 




D 




CRMs Nbal-GFP merge 




% 







F 




site remnants 



Figure 5. Nba1 Is Inherited at Cytokinesis Remnants by a Two-Step Mechanism 

(A) Pulse-chase experiments using NIS1-GFP, pGa\^ -NIS1 -GFP, NBA1-GFP, or pGall -NBA 7 -GFP cells were performed as illustrated in the time diagram. The 
expression of genes under control of the galactose promoter was induced for a period of 2 hr, allowing the accumulation of the corresponding protein. After 
repression of the galactose promoter (addition of glucose, t = 0), the localization of the indicated GFP-tagged proteins was analyzed over time. CRMs were 
stained with calcofluor. Newly formed CRMs located proximal to the new emerging daughter cell (blue), whereas older ones locate more distally (red). Protein 

(legend continued on next page) 
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constrains Rsr1 function and acts alongside landmark proteins, 
which are involved in the spatial control of bud site selection (Ca- 
samayor and Snyder, 2002). Our data further establish that the 
inhibitory mode of Nba1 action upon Cdc42 differs from the con- 
trol of GTP hydrolysis that is orchestrated by the GAP protein 
Rga1 . Accordingly, epistasis studies established that Rga1 and 
Nba1 collaborate in the inhibition of Cdc42, because cells lacking 
both proteins reused the same polarity site for Cdc42 activation 
with a higher frequency than cells lacking either regulator alone. 
However, in contrast to Rga1 , which is confined to the site of 
cell division, Nba1 remains associated with older CRMs. Our 
data thus indicate that Nba1 functions as a molecular spatial 
memory that confers protection from recurrent Cdc42 polariza- 
tion at the same site. We propose that CRMs are marked with 
“negative polarity cues” that prevent Cdc42 reactivation. 

We previously reported that Gps1 binds Cdc42 and prevents 
Cla4-mediated polarity establishment at the cell-division site 
(Meitinger et al., 201 3). However, it was not clear how this inhibi- 
tion was achieved on a molecular level. We found that Gpsi 
directly interacts with Nbal and Cdc42. Because Gpsi is 
dispensable for the inhibition of Cdc42-Cla4-mediated polarity 
establishment upon artificial targeting of Nbal to the cell-division 
site (Figure 2), the inhibition of Cla4 by Gpsi is most likely a 
consequence of the upstream inhibition of Cdc42 that occurs 
at the level of Cdc24. We thus favor the idea that Gpsi acts as 
a scaffold protein at the cell-division site that supports local inac- 
tivation of Cdc42 by Nbal . 

We describe an elaborated system of three successive steps 
(recruitment, transfer/inheritance, and maintenance) that are 
absolutely required to load Nbal onto CRMs. First, Gpsi recruits 
Nbal and the adaptor protein Nisi to the cell-division site. At this 
site, the chaperone Napi is essential for the transfer of Nbal and 
Nisi from Gpsi to the transmembrane proteins Rax1-Rax2. 
Nbal then remains associated with Rax2 at CRMs in a Nisl- 
dependent manner. Based on localization and deletion analysis, 
we propose the existence of two complexes that associate with 
CRMs of different ages. The Gpsi /Nbal /Nisi /Napi complex is 
present only at emerging CRMs (cell-division site), whereas the 
Rax1/Rax2/Nis1/Nba1 complex associates with older CRMs 
and maintains this association over multiple generations. Impor- 
tantly, disruption of Nbal at emergent CRMs causes a more 



severe Cdc42 inhibition phenotype than Nbal removal from es- 
tablished older remnants (e.g., \nnis1A, nap1A, rax1A, or rax2A 
cells). We therefore reasoned that inhibition of Cdc42 at the most 
recent site of cell division is more critical than its inhibition at 
older CRMs. This might explain why cells also rely on the GAP 
Rgal to prevent Cdc42 activation at the cell-division site (Tong 
et al., 2007). 

Why have cells developed such an elaborated mechanism to 
maintain and protect CRMs? In the absence of Nbal and/or 
Rgal, the replicative lifespan of yeast cells was significantly 
shorter than in wild-type controls. This shortening of lifespan 
correlated with nuclear segregation defects that, in most cases, 
immediately preceded cell death. We previously reported that 
cells lacking GPS1 had an increased rate of cell death that was 
exacerbated by codepletion of RGA1 (Meitinger et al., 2013). 
However, several evidences support that cell death of nba1A 
rga1A and gps1A cells are caused by different mechanisms. 
First, nba1 A rga1 A cells do not show a defect in cell growth as 
gps1A cells do (Figure S6). Second, in contrast to gps1A mu- 
tants, whose daughter cells selectively dye during cytokinesis 
(Meitinger et al., 2013), we mainly observed cell death of large- 
budded nba1A rga1A cells shortly after binucleation (Figure 6). 
Third, the growth defect of gps 7 zl cells is based on the misregu- 
lation of the Rhol -pathway (Meitinger et al., 2013), which is func- 
tional in nba1A rga1 A cells (Table 1). We thus propose that the 
narrowing of the bud-neck diameter by repeated reuse of the 
same polarity site was the most likely cause of nuclear segrega- 
tion defects and cell death in nba1A rga1A cells. In support of 
this assumption, we found that deletion of RSR1, which 
decreased the probability of reuse of the cell-division site for 
bud growth, improved the survival rate of nba1A rga1A cells. 
We thus propose that the combined impact of the different 
Cdc42 cell polarity control mechanisms guarantees genome 
stability and the fitness of dividing cells. In light of our study, 
many open questions remain. For example, it is unclear why 
yeast cells developed such a complex system for protecting 
CRMs against Cdc42 repolarization when other cell types simply 
remove CRMs (Pohl and Jentsch, 2009). It also begs the ques- 
tion as to whether CRMs may play other mother cell-specific 
functions that are yet to be appreciated, perhaps during the 
aging process itself. 



levels were analyzed using anti-GFP antibodies. An unspecific band served as a loading control. The graphs represent the quantification of the percentage of 
large-budded cells showing Nba1 -GFP at the cell-division site or CRMs (only cells with CRMs were counted). Images show representative cells analyzed at early 
(2 hr) or late (8 hr) time points after repression of the galactose promoter. The graphs on the right indicate the percentage of cells with the given GFP-tagged 
protein at none, all, only proximal, or only distal CRMs (“CRM occupancy”). Only cells with more than one CRM were counted, n > 1 00 cells per time point. CRMs 
were stained with calcofluor. 

(B) Nba1 , Nisi , and Rax2 are fused to the tandem fluorophore sfGFP-mCherry. The ratio between mCherry and sfGFP fluorescence intensities at the cytokinesis 
remnants (Rcrm) and the cell-division site (Rneck) are used to determine the age of the different protein pools. The graph shows the logarithm of the ratio of Rcrm 
and Rneck. log(RcRM/ Rneck) >0 indicates that the protein pool at cytokinesis remnants is older than at the cell-division site. 

(C and D) Fluorescence recovery of Nisi -GFP (C) and Nbal -GFP (D) at CRMs 4.5 hr after photobleaching. The numbers on the right indicate CRMs pre- and 
postbleach; the green color indicates the presence of the GFP fusion protein at the given CRM. CRMs were stained with calcofluor. 

(E) Model for the timely loading of Nbal and Nisi to CRMs based on pulse-chase (A), age determination (B), FRAP experiments (C and D), and localization studies 
(Figure 1). The aforementioned data indicate that Gpsi is recruited to the newly formed bud neck and recruits Nbal and Nisi to the cell-division site of large- 
budded cells. After cell separation. Nisi and Nbal but not Gpsi or Napi remain stably associated with the emerging CRM as the new bud emerges. Each color 
represents one generation in which the proteins are attached to CRMs. 

(F) Model for the formation of a Gpsi dependent complex at the cell-division site. During cytokinesis, Nbal and Nisi are transferred to the transmembrane protein 
Rax2, which depends on the chaperone Napi . Both complexes are necessary to prevent Cdc42 activation at the cell-division site and at CRMs. CW, cell wall; PM, 
plasma membrane and secondary septum are indicated. Scale bars, 5 |im. See also Figure S5. 
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Figure 6. Inhibition of Cdc42 at Cytokinesis Remnants Contributes to the Fidelity of Nuclear Segregation and Longevity 

(A) Time-lapse series of wild-type (WT) and nba1A rga1 A single cells from birth to death in the microfluidic device at the indicated time points. The histone, Htb2 
tagged with GFP, was used as a nuclear marker. The white arrows indicate the cells that became trapped in the cavity. Red numbers indicate how many buds the 
cell generated since birth. Red arrows mark nuclear segregation delays and binucleated cells. 

(B) Survival probability curves of wild-type (WT, n = A7),nba1 A {n = 2>Q),rga1 A (n = 25), and nba1 A rga1 A (n = 21) cells. The median survival is indicated. All mutants 
display a significantly reduced longevity compared to WT (p < 0.001). 

(C) Temporal trajectories showing successive divisions of single cells. Green, red, and blue colors indicate normal anaphase (<20 min), delayed anaphase 
(>20 min), and binucleated cells, respectively. Dark colors indicate when cells have entered senescence. 

(D) Quantification of the probability per generation of delayed nuclear segregation (>20 min, red bars) or binucleation (blue bars) to occur. The difference between 
nba1A rga1A and young wild-type cells was significant (p < 0.001). Senescent wild-type cells were scored separately. 

(E) TEM transverse (top view) and longitudinal (side view) sections of nba1A cells with four collars at the bud-neck region, as indicative of recurrent rebudding at 
the same location. 

(F) Quantification of the bud-neck diameter oinbalA, rga1A, and nba1A rga1A cells with different numbers of bud-neck collars (no collar, n = 20; one collar, n = 
18; twos collar, n = 18). Data are represented as mean ± SEM. *p < 0.001 . 

(G) Graph depicting the percentage of cells with multiple collars (determined in F) versus the median lifespan (determined in B) for the indicated strains. 

(H) Percentage of surviving cells at eight generations for the indicated cell types was calculated based on single-cell analysis performed as described in (A) (n = 40 
for each strain). Note that, due to the random budding pattern of rsr1 A cells, the time that the mother cell remained in the inspection chamber was limited to seven 
to nine generations. *p < 0.001 . See also Figure S6. 
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Molecular remnants of the cell-division site persist at the 
plasma membrane of daughter cells in several organisms and 
do so in a cell-type-specific manner. For example, stem cells 
mimic budding yeast, in accumulating division remnants in suc- 
cessive cell divisions (Chen et al., 2013). Thus, it is tempting to 
speculate that also in other organims the cytokinetic machinery 
may act as platform that allows the specific and asymmetric in- 
heritance of associated protein complexes, which might affect 
cell fate determination. CRMs have recently been implicated in 
determining the spatial orientation and establishment of cell po- 
larity in neurons (Pollarolo et al., 201 1 ; Wilcock et al., 2007). Inter- 
estingly, Cyk4 (a metazoan inhibitor of the Rho GTPase Rad) 
persists at cell-division remnants of C. elegans (Bastos et al., 
2012; Jantsch-Plunger et al., 2000), and this persistence has 
been implicated in positioning the centrosome and cell-division 
plane during early embryonic cell divisions (Hyman, 1989; Keat- 
ing and White, 1998; Waddle et al., 1994). Thus, it is tempting to 
speculate that CRMs play a conserved role in affecting cell polar- 
ity by acting as a spatial memory in a variety of organisms 
ranging from yeast to higher eukaryotes. 

EXPERIMENTAL PROCEDURES 

Yeast Methods and Plasmid Construction 

Yeast strains and plasmids used in this study are listed in Tables S1 and S2. 
For bacterial expression, genes or gene fragments were cloned into pGEX- 
5X-1 (GST, GE Healthcare), pET28c (6His, Novagen), and pMAL-c2x (MBP, 
NEB). Yeast growth and media were as described (Sherman, 1991). Gene de- 
letions and epitope tagging were performed using PCR-based methods 
(Janke et al., 2004; Knop et al., 1999). Genes of interest were expressed 
from their endogenous promoter unless specified otherwise. Yeast strains 
were grown in yeast peptone dextrose medium containing 0.1 mg/I adenine 
(YPAD). Strains carrying plasmids were grown in synthetic complete (SC) me- 
dia lacking the corresponding amino acids. Loss of L/f?A3-containing plas- 
mids was assessed by using plates containing 1 mg/ml 5-fluoroorotic acid 
(5FOA). 

Genetic Interactions Based on Growth 

The plasmid shuffle strategy was used to test the viability of single and double 
mutants (genetic interaction). Briefly, mutant strains containing the corre- 
sponding wild-type gene on an URA3-based plasmid (pRS316) were 
analyzed for growth on 5FOA plates (selection against URA3). At least six in- 
dividual transformants were analyzed for each mutant, and one representa- 
tive mutant is shown. For each mutant 10-fold serial dilutions of wild-type 
(ESM356-1) and mutants with the indicated genotype were spotted onto 
SC and 5FOA plates. Mutants were complemented by pRS316-GPS7 (Fig- 
ures 2H and S2D) or pRS31 6-CDC24 (Figures 31 and S3G). Strains carrying 
genes on a /_EL/2-based plasmid (Figures 2H, 31, and S3G) were grown on 
medium lacking leucine to avoid plasmid loss. Plates were incubated for 1 
or 2 days at 30°C. 

Protein Detection Methods and Quantifications 

Yeast protein extracts and Immunoblotting were performed as described 
(Janke et al., 2004). Antibodies were rabbit anti-GFP antibody, mouse anti- 
tubulin (Tubi), mouse anti-HA (clone 12CA5, Sigma), mouse anti-Myc (clone 
9E10, Sigma), rabbit anti-Clb2, rabbit anti-TAP (Open Biosystems), mouse 
anti-MBP (NEB), and guinea pig anti-GST. Secondary antibodies were goat 
anti-mouse, goat anti-rabbit, and goat anti-guinea pig immunoglobulin Gs 
coupled to horseradish peroxidase (Jackson ImmunoResearch Laboratories). 
Protein concentrations were determined using Bradford reagent accordingly 
to manufactures’ instructions (Bio-Rad). Immunoprecipitation, in vitro binding, 
and yeast two-hybrid assays are described in Extended Experimental 
Procedures. 



Microscopy Techniques 

For fluorescence still image analysis, cells carrying GFP or Cherry fusion 
proteins were inspected without fixation (Figures ID, IE, 2A-2D, 3C-3F, 
5B-5D, SIB, S2B, S4A, S4B, S4F-S4H, S5B, and S6A) or after fixation 
with 4% formaldehyde for 20 min (Figures 2E, 2G, 4A, 4B, 5A, S2C, and 
S5A). CRMs were stained with 0.1 mg/ml calcofluor (Fluorescence Bright- 
ener 28, Sigma-Aldrich). Live-cell imaging and quantification of fluorescence 
still images were performed as described in Extended Experimental Proce- 
dures. The tandem fluorescent protein timer experiment for Rax2, Nbal , and 
Nisi was performed and quantified as described in Extended Experimental 
Procedures (Khmelinskii et al., 2012). Long-term FRAP analysis for Nbal and 
Nisi was performed in observation chambers of a Y4C microfluidic plate 
(CellAsic) at 30°C as described in Extended Experimental Procedures 
(Khmelinskii et al., 2012). Specimens for electron microscopy were prepared 
as described previously in Extended Experimental Procedures (Meitinger 
et al., 2011). 

Replicative Lifespan Anaiysis in a Microfiuidic Device 

Time-lapse images (phase contrast and fluorescence) were acquired using a 
microfiuidic device as described previously (Fehrmann et al., 2013) using an 
automated epifluorescence microscope (Nikon Eclipse Ti) with a digital 
CMOS camera (Hamamatsu Orca Flash 4.0). Cells were grown in synthetic 
complete medium and then were loaded into the device and allowed to divide 
until death during typically 100 hr. Constant media perfusion allowed keeping 
the cells in exponential growth during the course of the experiment. Replicative 
lifespans were analyzed from time-lapse images (one frame every 1 0 min). Sur- 
vival curves were calculated from the number of buds produced by individual 
mother cells. Nucleus segregation defects were scored using a nuclear marker 
(Htb2-GFP). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, six 
figures, and two tables and can be found with this article online at http://dx. 
doi.org/1 0.1 01 6/j.cell.201 4.1 0.01 4. 
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SUMMARY 

Hematopoietic stem cells (HSCs) underlie the produc- 
tion of blood and immune cells for the lifetime of an 
organism. In vertebrate embryos, HSCs arise from 
the unique transdifferentiation of hemogenic endo- 
thelium comprising the floor of the dorsal aorta during 
a brief developmental window. To date, this process 
has not been replicated in vitro from pluripotent pre- 
cursors, partly because the full complement of 
required signaling inputs remains to be determined. 
Here, we show that TNFR2 via TNFa activates the 
Notch and NF-kB signaling pathways to establish 
HSC fate, indicating a requirement for inflammatory 
signaling in HSC generation. We determine that prim- 
itive neutrophils are the major source of TNFa, as- 
signing a role for transient innate immune cells in 
establishing the HSC program. These results demon- 
strate that proinflammatory signaling, in the absence 
of infection, is utilized by the developing embryo to 
generate the lineal precursors of the adult hematopoi- 
etic system. 

INTRODUCTION 

In all vertebrate animals studied, the homeostasis of adult blood 
and immune cells is ultimately maintained by rare subsets of he- 
matopoietic stem cells (HSCs) (Kondo et al., 2003). During a brief 
window during embryonic development, these HSCs arise de 
novo from hemogenic endothelium comprising the floor of the 
dorsal aorta (DA) (Bertrand et al., 2010a; Boisset et al., 2010; 
de Bruijn et al., 2000; Kissa and Herbomel, 2010) in a process 
that appears to be conserved among all vertebrates (Clements 
and Traver, 2013; Godin and Cumano, 2002). A more complete 
understanding of the signaling pathways that instruct HSC emer- 
gence could in principle inform in vitro approaches utilizing 
pluripotent precursors to create patient-specific HSCs (Kyba 
and Daley, 2003). Despite decades of efforts, this goal has not 
yet been achieved, in part due to an incomplete understanding 
of the native molecular cues needed to establish HSC fate. 
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One known requirement for HSC emergence is signaling 
through the Notch pathway (Bigas et al., 2013). Notch regulates 
many forms of intercellular communication, underlying many 
cell-fate decisions, including key roles in embryonic pattering 
(Kopan and llagan, 2009). Although the role of Notch in the main- 
tenance and function of adult HSCs appears to be dispensable 
(Bigas and Espinosa, 2012), Notch signaling is absolutely 
required in the embryonic specification of HSCs in both the 
mouse (Bigas and Espinosa, 2012) and zebrafish (Bertrand 
et al., 2010b). In mice, the Notch receptor Notchl (Kumano 
et al., 2003) and the Notch ligand Jaggedi (Jagi) are required 
for HSC specification (Bigas et al., 2010). It is important to note 
that, because Notch signaling is also indispensable for arterial 
specification (Quillien et al., 2014) and because HSCs derive 
directly from the aortic floor, it has been difficult to distinguish 
whether Notch signaling regulates HSC emergence indepen- 
dently from its role in upstream arterial specification. Recent 
studies in Jagi -deficient mice have demonstrated HSC defects 
in the presence of normal arterial development, suggesting that 
these Notch requirements may be distinct and separable. 
Recent studies have also demonstrated that Notch signaling is 
required intrinsically within HSCs or their precursors (Robert- 
Morenoetal., 2008) via function of the Notchl receptor (Hadland 
et al., 2004), suggesting that Jagi may be a specific ligand of 
Notchl in the specification of HSCs. 

Tumor necrosis factor a (TNFa) is a powerful proinflammatory 
cytokine that plays a pivotal role in the regulation of inflammation 
and immunity. TNFa exerts its functions via engagement of one of 
two specific cell surface receptors (TNFRs), namely the 55 kDa 
TNFR1 (also known as TNFRSF1 A) and the 75 kDa TNFR2 (also 
known as TNFRSF1 B) (Shalaby et al., 1 990). TNFR1 is expressed 
in most cell types, whereas TNFR2 is restricted to immune and 
endothelial cells (Aggarwal, 2003). Whereas TNFa signaling regu- 
lates aspects of adult hematopoiesis (Mizrahi and Askenasy, 
2014), a potential role in the developmental specification of 
HSCs has not been addressed. However, it has been reported 
that TNFa and its receptors are highly expressed in the murine 
yolk sac and fetal liver, suggesting a possible role for this inflam- 
matory cytokine in embryonic hematopoiesis (Kohchi et al., 1 994). 

Nuclear factor-kappa B (NF-kB) is a ubiquitous, inducible tran- 
scription factor that is activated by a diverse number of stimuli, 
including TNFa (Ahn and Aggarwal, 2005; Brown et al., 2008). A 
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multitude of downstream targets, as well as upstream inducers, 
position NF-kB as a general sensor of cell stress. TNFa signaling 
through TNFR2 is a well-known activator of NF-kB (Aggarwal 
et al., 2012; Faustman and Davis, 2010). TNFa activates NF-kB 
through its canonical pathway, in which kBs (NF-kB inhibitors) 
are phosphorylated, ubiquitinated, and degraded, releasing NF- 
icB dimers that then translocate to the nucleus to bind specific 
NF-kB DNA binding sites to activate gene expression (Brown 
et al., 2008). A direct role of NF-kB in HSCs has not been exten- 
sively studied, although recent reports indicate that NF-kB posi- 
tively regulates the transcription of genes involved in the mainte- 
nance and homeostasis of hematopoietic stem and progenitor 
cells (HSPCs) (Stein and Baldwin, 2013), as well as their microen- 
vironmental interactions (Zhao et al., 201 2). Whether or not NF-kB 
is important in HSC emergence has not been investigated. 

TNFa and TNFRs (Tnfa and Tnfrs utilizing zebrafish nomencla- 
ture) are well conserved in all vertebrate organisms (Wiens and 
Glenney, 2011), and we previously demonstrated that zebrafish 
Tnfa interacts with Tnfri and Tnfr2 (Espin et al., 2013). Recent 
studies in the zebrafish indicate that zebrafish Tnfa functions as 
a proinflammatory cytokine by activating endothelial cells (Roca 
et al., 2008). Additionally, the genetic inhibition of Tnfrs identified 
an essential role for Tnfa signaling in the development and main- 
tenance of endothelial cells (Espin et al., 2013). Because FISCs 
arise from hemogenic endothelial cells, we queried whether 
TNF signaling plays a role in HSC emergence. In the present 
study, we demonstrate a previously unappreciated requirement 
for TNF signaling in the generation of HSCs. We also show that 
NF-kB is active in nascent HSCs and that this activation is essen- 
tial for HSC emergence. Finally, we identify primitive neutrophils 
as a key source of T nfa, assigning these cells a previously uniden- 
tified role in HSC development. In summary, we report an im- 
portant role for inflammatory signaling in the birth of the adult 
hematopoietic system that is mediated by the proinflammatory 
cytokine Tnfa, the inflammatory transcription factor NF-kB, and 
the Notch signaling pathway under nonpathogenic conditions. 

RESULTS 

Tnfa Signaling through Tnfr2 Is Required for Definitive, 
but Not Primitive, Hematopoiesis 

We previously demonstrated that Tnfa is required for embryonic 
blood vessel development (Espin et al., 2013). Because HSCs 
are generated from arterial vessels in the embryo (Bigas et al., 
2013), we investigated whether this proinflammatory cytokine 
also played a role in HSC development. To address this question, 
we isolated kdrr endothelial cells by fluorescence-activated cell 
sorting (FACS) from 26 hr postfertilization (hpf) transgenic kdri: 
mCherry embryos and performed quantitative PCR (qPCR) for 
tnfri and tnfr2. Both transcripts were enriched in these cells 
compared to the whole embryo (Figure SI A available online). 
Sorted cells expressed high levels of endogenous kdrI and 
were negative for the muscle-specific myod gene, demonstrating 
the purity of the sorted cells (Figure SI B). To investigate whether 
Tnfa signaling was required for HSC specification, we performed 
loss-of-function experiments for Tnfa and its two receptors, Tnfri 
and Tnfr2, utilizing specific antisense morpholines (MOs) (Espin 
et al., 2013). In the zebrafish embryo, HSCs can be visualized 



along the axial vessels by expression of cmyb using whole- 
mount in situ hybridization (WISH) (Burns et al., 2005). The num- 
ber of emyb^ cells in or near the floor of the DA was significantly 
reduced in Tnfa- and Tnfr2-deficient embryos compared with 
their wild-type (WT) siblings (Figures 1A and IB). However, loss 
of Tnfri showed no effect on HSC number, and its simultaneous 
depletion with Tnfr2 was not significantly different than loss of 
Tnfr2 alone (Figures 1A and IB), indicating that the action of 
Tnfa through Tnfr2, but not Tnfri , is important in HSC develop- 
ment. This result was supported by quantitation of cd41:eGFP'^ 
HSPCs (Bertrand et al., 2008) using flow cytometry, which 
were significantly decreased in Tnfr2- and Tnfa- deficient fish 
at 3 days postfertilization (dpf) (Figure 1C). 

To further confirm the reduction of HSCs in Tnfr2- and Tnfa- 
deficient embryos, we directly visualized emerging HSCs from 
the floor of the DA in kdrkmCherry; cmyb:GFP double trans- 
genic embryos (Bertrand et al., 2010a) at 48 hpf by confocal mi- 
croscopy (Figure ID). Consistent with the results above, the 
number of double-positive kdrP; emyb'^ HSCs in the floor of 
the DA was reduced ~50% when compared to control embryos 
(Figures ID and IE), unaffected in Tnfri deficient embryos, and 
showed a similar 50% decrease in Tnfri +Tnfr2 double-depleted 
embryos (Figure SI C). These reductions could be due to a defect 
in the initial specification of HSCs or in their subsequent mainte- 
nance. To distinguish between these possibilities, we performed 
WISH for the nascent HSC marker runxi at earlier time points. 
Both Tnfr2- and Tnfa- deficient embryos showed significant 
reduction in the number of runxl'^ cells in the aortic floor at 24, 
28, and 36 hpf (Figures SID and S1E), indicating that the func- 
tions of Tnfa and Tnfr2 are important during the earliest steps 
of HSC specification. 

We next examined subsequent developmental stages for 
possible roles of Tnfa in the maintenance of nascent HSCs. 
To determine whether Tnf receptor expression is modulated 
following HSC specification, we purified kdrP; cmyb~ endothelial 
cells and kdrP; emyb'^ HSCs from 48 hpf kdrkmC harry; cmyb: 
GFP embryos by FACS. qPCR analysis showed that, whereas 
tnfri mRNA levels were similar in HSCs and endothelial cells, 
tnfr2 transcripts markedly increased in HSCs (Figure 1 F). As this 
result suggested that Tnfr2 may play a role in HSC maintenance, 
we analyzed changes in HSC number in individual embryos over 
time. The number of emyb^; kdrP cells in WT animals expanded 
between 36 and 48 hpf, whereas Tnfr2- or Tnfa-deficient siblings 
showed similar numbers of HSCs at either time point (Figures 1 G 
and 1H). Together, these results suggest that Tnfa signaling 
through T nfr2 is important both in the first steps of HSC specifica- 
tion and in their subsequent maintenance following emergence 
from the aortic endothelium. Finally, we examined later larval 
stages by monitoring the expression of ragi and Ick, two genes 
expressed in developing thymocytes (Langenau et al., 2004) 
because the T cell lineage derives exclusively from HSCs (Ber- 
trand et al., 2008; Gering and Patient, 2005). Expression of ragi 
was completely or nearly absent, respectively, in Tnfr2- and 
Tnfa-deficient animals at 4 dpf (Figure II). However, the thymic 
aniage developed normally in all morphants, assessed by the 
expression of the thymic epithelial marker foxnl (Figure 1J). 
These results were further verified utilizing lck:eGFP transgenic 
animals to track T cell development (Langenau et al., 2004). 
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Figure 1. Tnfa and Tnfr2 Are Required for HSC Generation 

(A) Standard control (Std), Tnfr1, Tnfr2, Tnfa, Tnfr1, and Tnfr2 morphants were examined by WISH for cmyb expression in the aortic floor at 48 hpf. White 
arrowheads denote cmyb^ HSPCs. 

(legend continued on next page) 
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T cells were absent in Tnfr2- and Tnfa- deficient larvae at 4 dpf, 
whereas Tnfr1 -deficient siblings showed normal T cell develop- 
ment (Figure S1 F). Together, these results indicate that Tnfa sig- 
nals via Tnfr2 and that this signaling pathway is important both for 
early specification and subsequent maintenance of FISC fate, 
such that the lineage is apparently lost by 4 dpf. 

To further dissect the role of Tnfa signaling in hematopoiesis, 
we assessed whether Tnfa and its receptors were required for 
the first waves of hematopoiesis, commonly referred to as “prim- 
itive” due to the transience of these cells and lack of upstream 
multipotent progenitors. In zebrafish, primitive hematopoiesis 
generates macrophages, neutrophils, and erythrocytes. The 
expression of csfira, a specific marker of macrophages (Herbo- 
mel et al., 2001), was unaffected in Tnfa-, Tnfri-, and Tnfr2-defi- 
cient embryos at 24 hpf (Figure S1G). Additionally, primitive 
neutrophils were unaffected at 30 hpf, as assayed using trans- 
genic mpx:eGFP animals (data not shown). Similarly, primitive 
erythropoiesis, assessed by expression of the erythroid-specific 
transcription factor gafa 7a at 24 hpf, was unaffected in morphant 
embryos (Figure S1G). Overall, these results indicate that 
Tnfa signaling is dispensable for primitive hematopoiesis and 
indispensable for definitive hematopoiesis in the zebrafish 
embryo. 

Tnfr2- and Tnfa-Deficient Embryos Display Normal 
Vasculogenesis 

Because HSCs originate in arterial vessels, many mutants with 
vascular or arterial specification defects also have hematopoie- 
tic defects (Bigas and Espinosa, 2012). No vascular abnormal- 
ities were observed in Tnfr2- or Tnfa-deficient embryos at 24 
hpf when assayed by WISH for the endothelial marker kdri at 
the MO doses used in this study (Figure 2A), and circulation 
was normal (gata1:DsRed^, red blood cells) but reduced num- 
bers of HSPOs and thrombocytes (cd41 :eGFP^) at 3 dpf (Fig- 
ure 2B). These results suggest that the functions of Tnfr2 and 
Tnfa are required specifically during HSO development indepen- 
dently of their role in developing vasculature. Thus, we could un- 
couple the vascular defects previously described for T nfr2 (Espin 
et al., 2013) from its effects on HSO development using lower 
doses of Tnfr2 MO. 

To address whether HSO defects in Tnfr2- and Tnfa-deficient 
animals were a consequence of impaired arterial specification, 
we performed WISH for the arterial markers efnb2a, die, 



notchib, and notch3 (Lawson et al., 2001) in morphant embryos 
at 28 hpf. We observed no alterations in transcript levels when 
compared to control siblings (Figure 20). Taken together, these 
data indicate that Tnfa signaling through Tnfr2 is specifically 
required for HSO development. 

Tnfr2 Is Intrinsically Required for HSC Development 

Because Tnfr2 is expressed in endothelial cells (Figure SI A), we 
hypothesized that Tnfr2 is intrinsically required within the 
vascular lineage for HSO development. To test this hypothesis, 
we generated a transgenic zebrafish line in which the WT form 
of tnfr2 is upregulated via induction of the Gal4 transcriptional 
transactivator. HSO development was observed by confocal mi- 
croscopy following overexpression of Tnfr2 specifically within 
the vasculature in fli1a:Gal4; UAS:RFP; ciriyb:GFP; UAS:tnfr2 
animals. The number of RFP^GFP^ HSOs in quadruple trans- 
genic embryos was significantly increased compared to their 
Tnfr2“ siblings (Figures 2D and 2E), demonstrating that Tnfr2 ac- 
tivity induces or supports the HSO program following targeted 
expression to the vasculature. 

To verify that the loss of HSOs in Tnfr2 morphants was not due 
to the apoptosis of endothelial cells, we performed a TUNEL 
assay and immunohistochemistry for GFP in kdrl:GFP embryos 
injected with Tnfr2 MO. Analysis of endothelial cells by confocal 
microscopy at 28 hpf indicated that loss of Tnfr2 caused no 
increased apoptotic endothelial cells within the DA (Figure S2A), 
even though there was an increase in apoptotic nonendothelial 
cells. As a positive control for apoptosis in control animals, we 
imaged the lens of the eye (Oole and Ross, 2001) (Figure S2B). 
We also performed WISH for runx1 in the same experiment to 
verify the reduction of HSCs in these embryos (Figures S2C 
and S2D). These results, together with the findings that there 
are no detectable apoptotic endothelial cells in the DA at 28 
hpf (Kobayashi et al., 2014) indicate that the HSC specification 
defect in Tnfr2-deficient embryos is not caused by apoptosis 
induced by alterations of Tnfri ATnfr2 ratios within the 
vasculature. 

Tnfa Signaling Acts Upstream of Notch during HSC 
Specification 

During Notch activation. Notch receptors are stimulated by li- 
gands from neighboring cells, triggering the cleavage of the 
Notch intracellular domain (NICD), which enters the nucleus to 



(B) Quantification of emyb^ HSPCs from (A). Each dot represents totai emyb^ ceiis per embryo. The mean ± SEM for each group of embryos is shown in red. 

(C) cc/47;eGFP transgenic embryos were injected with Std, Tnfri , Tnfr2, and Tnfa MOs and subjected to flow cytometric analysis at 3 dpf. Each bar represents the 
percentage of cc/47:GFP^ cells in each sample and is the mean ± SEM of three to seven independent samples of five embryos each. 

(D) Maximum projections of 48 hpf cmyb/GFP; kdrbmCherry doub\e-transgen\c embryos injected with Std, Tnfr2, and Tnfa MOs. Arrowheads denote emyb^, kdrF 
HSCs along the DA. All views: anterior to left. 

(E) Enumeration of emyb'^, kdrP HSCs shown in (D). Bars represent mean ± SEM of Std (n = 13), Tnfr2 (n = 13), and Tnfa (n = 8) morphants. 

(F) cmyb~, kdrF endothelial cells and cmyb~, kdrF HSCs were isolated from cmybiGFP; kdrkmCherry transgenic fish by FACS at 48 hpf and examined for 
expression of tnfri and tnfr2. Bars represent mean ± SEM of two biological replicates. 

(G) Confocal tracking of HSC numbers in the floor of the DA from individual cmybiGFP; kdrkmCherry transgenic animals at 36 and 48 hpf following depletion of 
Tnfr2 or Tnfa compared to standard control morphants. 

(H) Means ± SEM of emyb^ cell numbers from (G). 

(I and J) WISH for the T lymphocyte and thymic epithelial markers rag7 (I) and foxnl (J) (black arrowheads), respectively, inTnfr2 and Tnfa morphants compared to 
Std controls at 4 dpf. All views are ventral, with anteriors to left. Numbers represent embryos with displayed phenotype; ns, not significant; *p < 0.05, **p < 0.01 , 
and ***p< 0.001. 

See also Figure SI . 
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Figure 2. Signaling through Tnfr2 Regulates HSC Development Independently of Its Role in Vascular Formation 

(A) Std, Tnfr2, and Tnfa morphants were interrogated by WISH for the expression of kdri at 24 hpf. 

(B) cc/47 :eGFP; gata1a:dsred double-transgenic embryos were injected with Std, Tnfr2, and Tnfa MOs and visualized at 3 dpf. Arrowheads indicate cc/47:GFP^ 
HSPCs in the CHT located between the caudal artery (CA) and caudal vein (CV). Arrows indicate blood flow direction. 

(C) Expression of the arterial markers efnb2a, die, notchib, and notchS in Std, Tnfr2, and Tnfa morphants analyzed by WISH at 28 hpf. Arrowheads denote the CA. 

(D) Maximum projections of fli1a:Gal4; UAS:tnfr2; cmybiGFP; kdrhmCherry transgenic embryos at 48 hpf. Region shown includes the DA, and arrowheads denote 
emyb^-, kdrP HSCs. 

(E) Enumeration of cmyb^\ kdrP HSCs shown in (D). Each dot is the number of kdrP; emyb^ cells per embryo. Means ± SEM for each group is shown in red. 
**p < 0.01 . All views are lateral, with anteriors to the left. Numbers in panels represent larvae with indicated phenotype. 

See also Figure S2. 
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function as transcription factor essential for cell fate decisions 
(Lai, 2004). There are four Notch receptors (Notchia, 1b, 2, 
and 3), five Delta family ligands (DIa, DIb, Die, Did, and DII4) 
and three Jagged ligands (Jagged la. Jagged 1b, and Jagged 
2) in zebrafish. Because TNFa activates the Notch pathway in 
certain contexts (Fernandez et al., 2008; Wang et al., 2013), we 
queried whether signaling through Tnfr2 may similarly activate 
Notch signaling to specify HSCs. We performed loss-of-function 
experiments for Tnfr2 and Tnfa in transgenic tp1:eGFP animals, 
in which GFP is expressed by cells having recently experienced 
Notch signaling (Parsons et al., 2009). Consistent with our other 
findings, the depletion of either Tnfa or Tnfr2 led to a 2-fold 
reduction in tp1:eGFP'^; kdrl:mCherry'^ HSPCs in the aortic floor 
at 26 hpf (Figure 3A, arrowheads, and Figure 3B). These obser- 
vations indicate that Tnfr2 signaling is upstream of Notch 
signaling during HSC specification. 

If Notch signaling is indeed required downstream of Tnfr2 
function for HSC specification, then ectopic expression of the 
Notchia intracellular domain (NICDIa) should rescue the lack 
of HSCs in Tnfr2- and Tnfa-deficient embryos. We performed 
two different experiments to address the timing and tissue spec- 
ificity of this Tnfa-dependent Notch requirement. To provide 
temporal control of NICDIa induction, we utilized inducible 
hsp70:Gal4; UAS:NICD1 a-myc double-transgenic embryos, 
which express NICDIa under the control of the inducible Gal4 
system. Induction of NICDIa at 18 hpf rescued the depletion of 
runx1^ HSCs at 28 hpf along the DA in both Tnfa and Tnfr2 mor- 
phants (Figure 3C). We then enforced the expression of NICDIa 
within endothelial cells utilizing kdrl:Gal4; UAS:NICD1 a-myc 
double-transgenic embryos that had been injected with Tnfr2 
or Tnfa MOs. Endothelial expression of NICDIa restored 
runxl'^ cells along the aortic floor (Figure 3D), indicating that 
TNF signaling activates the Notch pathway within hemogenic 
endothelium to specify HSC fate. 

Tnfa Induces Jag la within Endothelial Cells to Promote 
HSC Specification through Notchia 

We next investigated potential mechanisms by which Tnfa and 
Tnfr2 induced Notch activation. Due to the fact that Tnfa 
signaling has been reported to induce or inhibit the expression 
of specific Notch ligands (Fernandez et al., 2008; Sainson 
et al., 2008), we analyzed expression of the eight zebrafish Notch 
ligands within purified kdrP endothelial cells from Tnfr2-deficient 
embryos. Only Jag1a expression was downregulated in Tnfr2 
morphants relative to controls (Figure 4A). Using a fli1a:Gal4 
driver to enforce expression of Tnfr2 specifically within the 
vasculature, we examined Notch ligand expression in fli1a:Gal4; 
UAS:tnfr2 animals by qPCR (Figure 4B). We detected a 20-fold 
increase oUnfr2 in UAS:tnfr2'^ compared to UAS:tnfr2~ embryos 
(Figure 4B). Consistent with our previous results, only Jag1a 
mRNA levels were increased following the enforced expression 
of Tnfr2 (Figure 4B). 

Interestingly, Jagi is required for the generation of definitive he- 
matopoietic cells in mice but is dispensable for arterial develop- 
ment. A potential role for Jagi in zebrafish HSPC development 
has not been addressed. Two paralogues of the single JAG1 hu- 
man gene are present in the zebrafish genome: Jag1a and Jag1b. 
Because only Jag1a levels were modulated by Tnfr2, we per- 



formed loss-of-function experiments with this gene. Loss of 
jag1a led to decreased HSC numbers as analyzed by runx1 
expression along the DA (Figure 4C). However, specification of 
aortic fate was normal, as efnb2a and die levels were unperturbed 
(Figure 4C) . T o further verify that T nfr2 and Jag 1 a were in the same 
genetic pathway, we performed synergy studies by coinjecting 
low doses of Tnfr2 and Jagi a MOs simultaneously. Aortic 
runxl'^ cells were significantly reduced in Tnfr2- and Jagi a- dou- 
ble-deficient embryos compared to single-deficient embryos (Fig- 
ure 4D). Tnfr2 function thus lies genetically upstream oijagla dur- 
ing HSC specification. To investigate potential Jagla-presenting 
cells, we isolated cmyb~, kdrP endothelial cells and emyb^, 
kdrP HSCs for qPCR analysis oijagla at 48 hpf.yag7a transcripts 
were 4-fold more abundant in endothelial cells than in HSCs (Fig- 
ure S3), suggesting that Notch signaling in HSCs or hemogenic 
endothelium is activated by neighboring Jagla^ endothelial cells. 

We next investigated which of the four Notch receptors were 
downstream of Jagi a during HSC induction. In the mouse, 
Notchl is required within HSCs or their lineal precursors to 
instruct HSC fate. We therefore focused upon the two zebrafish 
orthologs of human NOTCH1 , Notchl a and Notchl b. To investi- 
gate whether either receptor functioned downstream of Tnfr2 to 
specify HSCs, we performed synergy experiments by coinjecting 
low doses of Tnfr2 MO with morpholinos against Notchia or 
Notchl b. Only the simultaneous depletion ofTnfr2 and Notchia, 
but not Tnfr2 and Notchl b, led to a statistically significant 
decrease in runx1 expression compared to single morphants at 
28 hpf (Figure 4E). This finding suggests that Notchia serves 
as the Notch receptor for Jagi a to specify HSC fate downstream 
of Tnfr2. 

The Proinflammatory Transcription Factor NF-kB Is 
A ctive in Emerging HSCs 

Activation of TNF receptors by ligand binding leads to the recruit- 
ment of adaptor proteins that trigger NF-kB activation (Aggarwal 
et al., 2012). Moreover, the induction of Jag 7 transcription by 
Tnfa in murine endothelial cells is NF-kB dependent (Johnston 
et al., 2009). Interestingly, NF-kB (as well as Tnfr2 and Jagi) is 
necessary for embryonic vessel development (Santoro et al., 
2007). These lines of evidence suggested that NF-kB could 
have a previously unappreciated role in HSC specification, 
prompting us to examine its role in HSC development. We uti- 
lized an NF-kB activation reporter transgenic line (Kanther 
et al., 2011) in combination with the kdrkmCherry transgene to 
perform confocal analysis of the DA at different time points. Inter- 
estingly, we observed NF-kB^'^^ cells in the floor of the DA at 
24 hpf, typically in pairs and in direct contact with each other 
(Figure 5A). We also observed NF-kB'^ cells along the roof of 
the DA but at a much lower frequency than in the floor (data 
not shown). NF-kB'^, kdrP cells remained visible at 30 hpf (Fig- 
ure 5A) and underwent endothelial-to-hematopoietic transition 
(EHT) (Movie SI), a characteristic feature of emerging HSCs. 
To further evaluate whether HSCs had increased NF-kB activa- 
tion compared to their surrounding endothelial neighbors, 
kdrP; emyb'^ HSCs and kdrP; cmyb~ endothelial cells were iso- 
lated from 48 hpf kdrkmCherry; cmyb:GFP embryos by FACS 
for qPCR analyses. Whereas endothelial cells had 20- to 30- 
fold induction of the NF-kB response genes interleukin 1 beta 
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(il1b) and nuclear factor of kappa light polypeptide gene 
enhancer in B cells inhibitor alpha a (ikbaa) relative to whole-em- 
bryo expression, HSCs displayed 300- and 2,300-fold increases 
in il1b and ikbaa, respectively (Figure 5B). Immunohistochem- 
istry for the NF-kB subunit p65 in kdrkmCherry embryos showed 
that, although p65 was detected in the cytoplasm of every cell as 
expected, it was more intense in the pronephros (Figures 5C and 
5D, yellow asterisks), in the DA, and in cells potentially undergo- 
ing the endothelial to hematopoietic transition in the aortic floor 
(Figures 5C and 5D, arrow). These results indicate that NF-kB 
activation is a characteristic feature of emerging FISCs. 

Multiple lines of evidence support the integration of the Notch 
and NF-kB signaling pathways during the differentiation of 
various cell types (Ang and Tergaonkar, 2007; Cao et al., 2011; 
Espinosa et al., 2010; Espinosa et al., 2003; Shin et al., 2006; 
Song et al., 2008). For this reason, we investigated whether 
NF-kB'^ cells in the floor of the DA also had active Notch 
signaling, utilizing double-transgenic tp1 :nlsCherry; NFKB:GFP 
animals to simultaneously visualize respective Notch and 
NF-kB activation. NF-kB'^ cells in the floor of the DA were also 
fp7^ (Figure 5E). No NF-kB'^, tp1~ cells were found in the floor 
of the DA, suggesting that Notch is (or was previously) active in 
NF-kB^ HSPCs. 

NF-kB Activation Is Required for HSC Specification and 
Acts Downstream of T nf r2 

To determine whether NF-kB function is required for HSC 
emergence, we developed a Tg(UAS:dn-ikbaa) transgenic ani- 
mal that functions as a dominant-negative inhibitor of NF-kB 
(Figures S4A and S4B). Similar truncation constructs have 
been utilized in vitro to inhibit NF-kB activation (Abbas and 
Abu-Amer, 2003). At 6 hr post-heat-shock in hsp70:Gal4; 
UAS:dn-ikbaa animals, dn-ikbaa mRNA levels were detected 
in dn-ikbaa'^, but not in dn-ikbaa~, siblings (Figure S4C). 
qPCR for the NF-kB response gene il1b in FACS-purified 
flUa'^ endothelial cells showed significant downregulation in 
the dn-ikbaa'^ embryos compared to their dn-ikbaa~ siblings 
(Figures S4D and S4E). Lipopolysaccharide (LPS) challenge of 
WT embryos produced a significant increase in il1b expression 
compared to PBS-injected controls, as previously described 
(van der Vaart et al., 2013), but not in dn-ikbaa^ embryos (Fig- 
ures S4F and S4G), indicating that dn-ikbaa'^ embryos are un- 
able to trigger an inflammatory response through NF-kB. These 
results thus demonstrate that UAS:dn-ikbaa embryos have 
impaired NF-kB activation. 

Blockade of NF-kB function at 20 hpf in hsp70:Gal4] UAS:dn- 
ikbaa animals led to loss of HSCs at 48 hpf (Figure 6A). Loss of 
NF-kB specifically within the vasculature using ///7a:Gal4; 
UAS:dn-ikbaa double-transgenic embryos also led to a depletion 



of cmyb'*' cells (Figure 6B). qPCR for /77ib in FACS-purified endo- 
thelial cells showed a 3-fold decrease in Tnfr2 morphants (Fig- 
ure 6C), demonstrating that NF-kB acts downstream of Tnfr2 
during HSC specification. Together, these results suggest that 
NF-kB activation in hemogenic endothelium is a key event in 
the specification of HSCs. 

Primitive Neutrophils Are the Key Source of Tnfa 

In adult organisms, immune cells are the main source of TNFa, 
including T and B lymphocytes, macrophages, and neutrophils 
(Aggarwal, 2003). From 22 to 72 hpf, the temporal window over 
which zebrafish HSCs emerge from aortic endothelium, the 
only leukocytes present are primitive myeloid cells, namely mac- 
rophages and neutrophils (Herbomel et al., 1999; Le Guyader 
et al., 2008). Interestingly, tnfa expression in the zebrafish em- 
bryo was not detectable during the first 9 hr of development 
but was expressed before 24 hpf (Espin et al., 2013) when 
HSCs are initially specified (Clements and Traver, 2013). We 
therefore hypothesized that primitive myeloid cells were the 
source of Tnfa. We isolated mpeg:GFP'^ primitive macrophages 
and mpx:GFP'^ primitive neutrophils by FACS at two different 
time points and performed qPCR for tnfa (Figure 7A). Although 
both populations expressed tnfa, the highest expression was 
observed within the neutrophil fraction (Figure 7A). We then uti- 
lized a pul MO to specifically ablate both primitive myeloid 
lineages in vivo (Rhodes et al., 2005). pul MO efficacy was vali- 
dated by WISH using the panleukocyte marker l-plastin and the 
neutrophil marker mpx at 48 hpf (Figure S5A). Following ablation 
of primitive myeloid cells in pul morphants, HSCs were enu- 
merated by confocal microscopy of kdrPcmyb'^ cells. A 2-fold 
decrease in HSC number was detected in pul morphants 
compared to their control siblings (Figures 7B and 7C). To eluci- 
date which primitive myeloid population was responsible for the 
decrease in HSC number, we utilized an irf8 MO (Li et al., 2011), 
which skews myeloid development to almost entirely neutro- 
philic. Loss of the macrophage lineage was confirmed in irf8 
morphants by qPCR for the macrophage-specific marker 
mpeg1 (Figure S5B). Surprisingly, the number of kdrP; cmyb^ 
HSCs increased following loss of the macrophage lineage (Fig- 
ures 7D and 7E). In agreement with our tnfa expression data, 
this result suggests that neutrophils are the key source of the 
Tnfa needed for HSC emergence. To test this hypothesis, we 
quantified tnfa expression levels in pul - and irf8-deficient ani- 
mals. Expression of tnfa was consistently decreased following 
loss of pul function and increased following loss of irf8 function 
(Figure 7F). In addition, although runx1 was upregulated in irf8- 
deficient embryos, the simultaneous depletion of Tnfa and Irf8 
led to a marked reduction in runx1 expression, despite the 
elevated numbers of neutrophils present (Figure 7G). These 



Figure 3. Tnfa and Tnfr2 Act Upstream of Notch Signaling during HSC Specification 

(A) tp1:eGFP; kdrhmCherry embryos injected with Std, Tnfr2, and Tnfa MOs were visualized at 26 hpf. Arrowheads indicate cells in the floor of the DA with active 
Notch signaling. 

(B) Enumeration of tpF, kdrF HSCs from (A). Each dot represents the number of HSCs per embryo, and red lines indicate means ± SEM. ***p < 0.001 . 

(C) hsp70:Gal4; UAS:NICD-myc embryos injected with Std, Tnfr2, and Tnfa MOs were heat shocked at 18 hpf and WISH for runx1 was performed at 28 hpf. 
Arrowheads denote HSCs along the DA. 

(D) kdrl:Gal4; UAS:NICD-myc embryos injected with Std, Tnfr2, and Tnfa MOs were analyzed by WISH for runx1 at 28 hpf. NICD"^ larvae were identified using 
anti-myc-Alexa488 antibody (top). Numbers in panels represent the numbers of larvae with indicated phenotype. 
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Figure 4. Tnfr2 Induces yaggedf a in Endothelial Cells, Encouraging HSC Specification 

(A) kdrl\mCherry^ cells from dissected trunks of Std or Tnfr2 morphants were purified by FACS at 28 hpf for qPCR. Levels of indicated transcripts along x axis are 
shown relative to the housekeeping gene ef1a. Bars represent means ± SEM of duplicate samples. 

(B) AGM regions from fli1a:Gal4; UAS:tnfr2 embryos were dissected and subjected to qPCR for transcripts shown along x axis. Bars represent means ± SEM of 
triplicate samples expression relative to the housekeeping gene rps11. 

(C) Std (top) or Jag1 a (bottom) morphants were interrogated for runx1 expression at 26 hpf and efnb2a and die at 28 hpf by WISH. Numbers represent larvae with 
indicated phenotype. 

(D) Enumeration of runxl'^ cells in Tnfr2 and/or Jag1a morphants at 28 hpf. 

(E) Enumeration oUunxI^ cells in Tnfr2 and/or Notchl a and/or Notchl b morphants at 28 hpf. Each dot is the number of HSCs per embryo, and red lines indicate 
means ± SEM (D and E) ***p < 0.001 ; ns, not significant. 

See also Figure S3. 

findings demonstrate that production of Tnfafrom primitive neu- Overall, these data indicate that production of Tnfafrom prim- 
trophils is critical for the specification and/or maintenance of itive neutrophils activates Tnfr2, upregulating the expression of 
HSC fate. Jag1 a on the surface of endothelial cells. Jag1 a in turn activates 
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Notchl a, triggering a signaling cascade whereby NF-kB triggers 
a transcriptional program required for the emergence of HSCs 
from hemogenic endothelium (Figures S5C and S5D). 

DISCUSSION 

Traditionally, infection and inflammation were thought to play an 
indirect role in HSC homeostasis by causing increased prolifera- 
tion and skewed differentiation toward microbicidal immune cell 
lineages (Takizawa et al., 2012). However, recent studies indi- 
cate that HSCs can respond directly to the inflammatory cyto- 
kines interferon (IFN) a/p, y, and TNFa (Baldridge et al., 2011; 
King and Goodell, 2011). Additionally, there is evidence that 
HSCs can upregulate cytokines under stress-induced hemato- 
poiesis (Zhao et al., 2014). Here, we examined a much earlier 
step in the biology of HSCs— their specification and emergence 
from hemogenic endothelium in the developing embryo. The 
emergence of HSCs from the aortic floor is transient and occurs 
during developmental windows when the surrounding environ- 
ment is relatively sterile, whether it is in utero in mammals or 
within the chorion in teleosts. It is therefore surprising that a 
key pathway underlying the canonical response to infection 
and inflammation is required to generate the founders of the 
adult hematopoietic system. Our studies in the zebrafish demon- 
strate that depletion of Tnfa or its cognate receptor Tnfr2 leads to 
depletion of emerging HSCs. The key event elaborated by Tnfr2 
appears to be activation of the Notch pathway because ectopic 
provision of Notch signaling rescued HSCs in the absence of 
Tnfa or Tnfr2 function. Although Notch signaling is required for 
HSC specification across vertebrate phyla, little is known 
regarding how this Notch event is regulated or which of the 
many receptors or ligands are necessary to fate HSCs from 
ventral aortic endothelium. 

That the HSC program can be rescued in either Tnfa or Tnfr2 
morphants by enforced expression of NICD1 a within the vascu- 
lature demonstrates that the TNF pathway lies upstream of 
Notch in HSC specification. Our results suggest that signaling 
via Tnfr2 specifically controls Notch activation by inducing the 
Notch ligand jagla in cells within the DA. Synergy experiments 
depleting Notchl a and Tnfr2 combinatorially indicate that 
Notchl a is likely the receptor on HSCs that binds to the Jagla 
ligand presented by aortic endothelial cells. These findings are 
consistent with studies in the mouse embryo, where Notchl is 
required cell autonomously within HSCs or their lineal precursors 
for their specification (Hadland et al., 2004; Kumano etal., 2003). 
The zebrafish Notchl a and Notchl b receptors are evolutionary 
paralogues of mammalian Notchl (Kortschak et al., 2001) and 
are both expressed in the DA during the window of HSC emer- 
gence (Quillien et al., 2014). Our findings extend these results 
by demonstrating that Notchl function is evolutionarily con- 
served in the specification of HSCs and provide a more detailed 
mechanism regarding how Notchl may actually function in this 
process. Further studies will be required to determine the precise 
interactions between Jagi a and Notchl a and how these interac- 
tions lead to establishment of HSC fate. 

In addition to its regulation of the Notch pathway, our results 
also suggest that Tnfa exerts its effects through NF-kB. Although 
NF-kB is known to play a key role in adult mammalian hematopoi- 



esis (Gerondakis et al., 2012), a role in the embryonic emergence 
of HSCs has not been reported. The utilization of a NFkB:GFP re- 
porter line allowed us to image the in vivo activation of NF-kB, 
indicating that this activation is required within endothelial cells 
of the DA for HSC emergence. Furthermore, these studies sug- 
gest that this activity is downstream of Tnfa/Tnfr2 signaling. 
Intriguingly, these data also demonstrate that NF-kB^ cells in 
the floor of the DA are often positive for Notch activity when as- 
sessed along with the tp1 Notch reporter line. Whereas recent ev- 
idence suggests that Notchl can modulate NF-kB activity in 
different cellular contexts, it remains to be determined whether 
one factor is epistatic to the other or if both may operate together 
within the hemogenic endothelium to establish HSC fate. 

In this study, we have also discovered an unexpected role for 
neutrophils in HSC development. Whereas macrophages are 
involved in a broad array of developmental processes (Wynn 
et al., 201 3), an active role for neutrophils in modulating develop- 
mental events has not been described. Here, we report for the 
first time that primitive neutrophils are a major source of Tnfa 
and that the loss of either neutrophils or Tnfa results in the loss 
of developing HSCs. The prevailing view that primitive myeloid 
cells have evolved predominantly to provide early immunity is 
thus likely oversimplistic. At any time point during HSC emer- 
gence, whether early during HSC specification or later during 
EHT, we observed ~2-fold decreases in HSC number. That the 
lineal descendants of HSCs, most importantly T lymphocytes, 
are absent by 4-5 dpf indicates that the Tnfa/Tnfr2 signaling 
axis is required to sustain HSC function. Collectively, our findings 
suggest that activation of Tnfr2 is important both in hemogenic 
endothelium and in maintaining nascent HSC fate. It is important 
to note that tnfa is also expressed in endothelial cells (data not 
shown); contribution from the endothelium may thus play a role 
in either or both of these processes. The means to create condi- 
tional, tissue-specific gene disruption in the zebrafish will be 
required to precisely address the relative importance of each 
source. 

In conclusion, we show that TNFa, a cytokine that has become 
the paradigm for induction of inflammatory responses, is also 
key in the establishment of the hematopoietic system through 
its influence on HSC formation in the developing embryo. In addi- 
tion to the known signaling inputs required to establish HSC fate, 
inflammatory signals should now be added to this list. A major 
challenge for the field is to integrate each of these required inputs 
to better understand their spatial and temporal requirements, 
such that this knowledge may be utilized to instruct HSC fate 
in vitro from human pluripotent precursors, a major unrealized 
goal of regenerative medicine. 

EXPERIMENTAL PROCEDURES 
Zebrafish Husbandry and Strains 

Zebrafish embryos and adults were mated, staged, raised, and processed 
as described (Westerfield, 2000) and maintained in accordance with UCSD 
lACUC guidelines. See Extended Experimental Procedures for description of 
transgenic lines. 

Heat-Shock Treatment 

For induction of hsp70l:Gal4-6r\yen transgenes, embryos were placed in E3 
medium and transferred to a 38°C water bath for 45 min at noted stages. 
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Figure 5. NF-kB Is Active in Emerging HSCs 

(A) Trunk region ofkdrhmCherry; NFKBiGFP double-transgenic animals visualized by confocal microscopy at 24 hpf (left) and 30 hpf (right). Each image is a 2 ^im z 
slice. Arrowheads denote HSCs. 

(B) cmyb~, kdrF endothelial cells and cmyb^, kdrF HSCs were isolated by FACS at 48 hpf. Levels of the NF-kB target genes ikbaa and il1b, as well as the HSC 
marker cmyb, are shown relative to ef1a. Bars represent means ± SEM of two biological replicates. 



(legend continued on next page) 
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Figure 6. NF-kB Is Required for HSC Specification and Acts Downstream of Tnfr2 

(A) hsp70:Gal4-, UASidn-ikbaa embryos were heat shocked at 20 hpf. WISH for cmyb was performed at 48 hpf. 

(B) WISH for cmyb in fli1a:Gal4; UAS:dn-ikbaa~ (left) and fli1a:Gal4; UAS:dn-ikbaa'^ (right) embryos. Arrowheads mark cmyb^ cells along the DA. 

(C) kdrl’.mCherry'^ cells were FACS sorted from Std or Tnfr2 morphants at 28 hpf for qPCR. Levels of the NF-kB target gene il1b are shown relative to ef1a. Bars 
represent means ± SEM from duplicate samples. 

See also Figure S4. 



Generation of Transgenic Animais 

Tg {UAS:dnnfkbiaaf^^^ and Tg(UAS:tnfr2)'"^^^ embryos were generated by 
Tol2-mediated transgenesis via the multisite Gateway cloning system (Invitro- 
gen). See also Extended Experimental Procedures. 

Morphoiino Injection 

Specific antisense targeting MOs (Gene Tools) were resuspended in DEPC- 
treated water at 1-3 mM and injected in one-cell stage embryos. See also 
Extended Experimental Procedures. 

Enumeration of HSCs 

Animals were subjected to WISH for runx1 and cmyb at noted stages, and pos- 
itive cells were imaged and manually counted. Confocal microscopy was per- 
formed on cmyb:GFP; kdrkmCherry doub\e-tmnsgen\c animals (Bertrand etal., 
2010a), tp1:eGFP; kdrkmCherry double-transgenic animals, and NF-kB:GFP; 
kdrkmCherry double-transgenic animals. Z sections of the DA region were 
captured on a Leica SP5 microscope (Leica) using Volocity Acquisition, Visual- 
ization, and Restoration software (Improvision) and were manually counted. 

Fluorescent Visualization of Biood Fiow, HSPCs, and T Ceiis 

To visualize blood flow, HSPCs, and T cells, cd41 :eGFP; gata1 :dsred embryos 
at 3 dpf and lck:GFP larvae at 4 dpf, respectively, were anesthetized in Tricaine 
(200 |ag/ml) and examined using a Leica MZ16FA stereomicroscope. 

Flow Cytometry and FACS 

Briefly, embryos were dechorionated with pronase, anesthetized in tricaine, 
and dissociated with liberase or triturated with a PI 000 pipette. The resulting 
suspension was filtered with a 40 |am cell strainer, and flow cytometric acqui- 



sitions or FACS were performed on a FACS LSRII. See also Extended Exper- 
imental Procedures. 

Whole-Mount RNA In Situ Hybridization 

WISH was carried out as described (Thisse et al. , 1 993). Probes for the gatala, 
csfrira, kdri, cmyb, runx1 , foxn1 , efnb2a, die, notchib, notch3, and rag1 tran- 
scripts were generated using the DIG RNA Labeling Kit (Roche Applied 
Science) from linearized plasmids, dn-ikbaa probe was generated from bp 
118-933 of dn-lkBaa (see Figure S2). Embryos were imaged using a Leica 
M165C stereomicroscope equipped with a DFC295 color digital camera (Le- 
ica) and FireCam software (Leica). 

Statistical Analyses 

Data were analyzed by analysis of variance (ANOVA). In all figures, solid red 
bars denote the mean, and error bars represent SEM. * p < 0.05, ** p < 0.01 , 
and *** p < 0.001; n.s., not significant; n.d., not detected. 

Quantitative RT-PCR Anaiysis 

RNA was isolated from tissues with RNeasy (QIAGEN), and cDNA was 
generated with qScript Supermix (Quanta BioSciences). Primers to detect 
zebrafish transcripts are described in Table SI. Relative expression levels 
of genes were calculated by the following formula: relative expression = 

2~ (Ct[gene of interest] — Ct[housekeeping gene]) 

Immunofiuorescence of NICD'^ Animals 

The immunofluorescence staining for cMyc in hsp70:gal4] UAS:NICD-myc 
zebrafish embryos was performed as previously described (Kim et al., 
2014). 



(C) Schematic representation of the experimental design of (D). 28 hpf kdrkmCherry animals were transversally sectioned and subjected to double immuno- 
histochemistry for mCherry (red) and NF-hcB (green). DAPI (blue) was added to visualize nuclei. 

(D) Maximum projections of 1 ^irn sections. Arrowhead indicates a potential HSC emerging in the DA. DA and CV are demarcated by dashed white lines. Yellow 
asterisks indicate pronephric ducts. 

(E) tp1 :nls-mCherry; NFKBiGFP animals were visualized by confocal microscopy at 24 hpf. Each image is a 2 jim z slice. Arrowheads indicate HSCs. CV, caudal 
vein; DA, dorsal aorta; N, neural tube; No, notochord; YE, yolk extension. 

See also Movie SI . 
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Figure 7. Primitive Myeloid Cells Play a Key Role in HSC Specification 

(A) Primitive neutrophiis {mpx:GFP'^) and macrophages {mpeg:GFP^) were isoiated at 20-24 and 72 hpf by FACS and tnfa expression was quantified by qPCR. 
Expression was normaiized to ef1a and is presented reiative to whoie-embryo expression. Bars represent means ± SEM of two independent experiments. 

(legend continued on next page) 
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Detection of Apoptotic Cell Death by TUNEL Labeling 

The TUNEL assay was performed as previously described (Espin et al., 2013) 
with slight modifications. See also Extended Experimental Procedures. 

Lipopoiysaccharide Injections 

Jg{hsp:Gal4; UAS:dn-ikbaa) embryos were manually dechorionated at 24 hpf, 
followed by heat shock at 38°C for 50 min. Four hr post-heat-shock, 2 nl of 
PBS or LPS (900 i^g/ml) (L651 1 , Sigma) was injected into the posterior blood 
island (PBI). Embryos were then harvested 1 hr postinjection (hpi), and RNA 
was isolated for qPCR analysis. 

Microtome Sections and Immunohistochemistry 

Embryos were fixed with 4% PFA, embedded in paraffin, and sectioned at 
5 i^m in thickness with Leica microtome. Immunohistochemistry was per- 
formed as previously described (Kobayashi et al., 2014). The following anti- 
bodies were used: mouse anti-mCherry 1:500 (Abeam, abl 25096), rabbit 
anti-p65 (NF-hcB) (RB-1638-P; Lab Vision) 1 :200, donkey anti-rabbit IgG Alexa 
Fluor 594-conjugated (Molecular Probes, A-21207) 1:1,000, and donkey anti- 
mouse IgG Alexa Fluor 488-conjugated (Molecular Probes, A-11029) in addi- 
tion to DAPI 1:1000 (Life Technologies, D3571). 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, five 
figures, one table, and one movie and can be found with this article online at 
http://dx.d 0 i. 0 rg/l 0. 1 01 6/j.cell.201 4.1 0.031 . 
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SUMMARY 

Fighting viral infections is hampered by the scarcity 
of viral targets and their variability, resulting in devel- 
opment of resistance. Viruses depend on cellular 
molecules — which are attractive alternative tar- 
gets — for their life cycle, provided that they are 
dispensable for normal cell functions. Using the 
model organism Drosophila melanogaster, we iden- 
tify the ribosomal protein RACK1 as a cellular factor 
required for infection by internal ribosome entry site 
(IRES)-containing viruses. We further show that 
RACK1 is an essential determinant for hepatitis C 
virus translation and infection, indicating that its 
function is conserved for distantly related human 
and fly viruses. Inhibition of RACK1 does not affect 
Drosophila or human cell viability and proliferation, 
and RACK1 -silenced adult flies are viable, indicating 
that this protein is not essential for general transla- 
tion. Our findings demonstrate a specific function 
for RACK1 in selective mRNA translation and un- 
cover a target for the development of broad antiviral 
intervention. 

INTRODUCTION 

Viral infections are a significant threat for all living organisms. In 
humans, acute and chronic viral infections cause a wide spec- 
trum of diseases, including life-threatening inflammation and 
cancer. A major challenge for the control of viral infections is 
that viruses, due to the small size of their genomes, offer few 
intrinsic targets either for recognition by the immune system or 
for inhibition by antiviral effector molecules. Furthermore, the er- 
ror-prone viral polymerases allow RNA viruses to rapidly escape 
detection by the immune system and to resist the adverse effects 
of directly acting antiviral molecules. Significantly, viruses rely on 
numerous host factors for essential functions during their life cy- 
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cle. These are not subject to rapid sequence changes and hence 
provide good alternative targets for antiviral therapy. Therefore, 
a central challenge is to identify cellular factors required for viral 
replication but dispensable for normal cell function. 

RNA replication, transcription, and translation are critical 
steps in the life cycle of RNA viruses that involve interactions 
with host-cell molecules. In the model organism Drosophila 
melanogaster, the small interfering RNA (siRNA) pathway tar- 
gets viral RNAs (reviewed in Ding [2010]). In order to better 
characterize the contribution of the three core components of 
this pathway, Dicer-2, R2D2, and AG02, we performed a pro- 
teomic analysis of the complexes assembling around these 
molecules in infected Drosophila cells (K.M., J.-L.l., Y.V., and 
J.V., unpublished data). One protein copurifying with R2D2 
and AG02 in cells infected with the picorna-like Drosophila C 
virus (DCV) was the evolutionarily conserved ribosomal protein 
RACK1 . The RACK1 protein has been extensively studied dur- 
ing the last two decades and has been shown to be involved in 
different aspects of cell regulation. RACK1 is an adaptor pro- 
tein, interacting with a variety of signaling molecules (e.g., 
PKC, Src, and MARK) (Belozerov et al., 2014; Gibson, 2012; 
Long et al., 2014) and is a component of the 40S subunit of 
the ribosome (Coyle et al., 2009; Sengupta et al., 2004). 
RACK1 is thus ideally suited to connect signal transduction 
pathways to the regulation of translation (Nilsson et al., 2004). 
Indeed, RACK1 was found to interact with the initiation factor 
elF6, which associates with the 60S subunit of the ribosome 
and prevents its association with the 40S subunit. elF6 phos- 
phorylation by RACK1 -assisted PKC triggers its release from 
the 60S subunit, thus promoting the formation of 80S active ri- 
bosomes (Ceci et al., 2003). 

Here, we show that RACK1 is mandatory for DCV replication 
but is largely dispensable for cell viability and proliferation. 
We further demonstrate that RACK1 is required for internal ribo- 
some entry site (IRES)-dependent translation in Drosophila and 
in human hepatocytes, where this factor is an essential deter- 
minant of hepatitis C virus infection. By contrast, RACK1 is 
not required for 5' cap-dependent translation. Collectively, our 
data unravel a specific function for ribosomal protein RACK1 
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Figure 1. RACK1 Is Required for DCV Replication, but Not for 
Viability or Proliferation in Drosophila Cells 

(A and B) Quantification of DCV virai RNA ieveis by qRT-PCR (A) and of ceii 
numbers as estimated by DAPi staining (B) in ceiis treated with the indicated 
dsRNAs to induce siiencing. Ceiis treated with a dsRNA corresponding to GFP 
and AG02 sequences are used as a reference and a controi, respectiveiy. 

(C) S2 ceiis stabiy transfected with a metaiiothionein promoter-driven vector 
expressing a shRNA targeting the 5' UTR from the RACK1 gene were treated or 
not with CuS04 for 3 days, stained with DAPi and an anti-phospho-H3 anti- 
body (ieft), and counted (right). 

Data represent the mean and SEM of at ieast three independent experiments, 
ns, nonsignificant; *p < 0.05. See aiso Figure S1 and Tabie S1. 



in selective mRNA translation of fly and human viruses and 
uncover a target for the development of broad antiviral 
intervention. 

RESULTS 

RACK1 Is Required for Dicistroviridae Infection in 
Drosophila 

In a proteomic analysis of the interactome of Dicer-2, R2D2, and 
AG02 in virus-infected cells (K.M., J.-LI., Y.V., and J.V., unpub- 
lished data), we identified 16 ribosomal proteins. To address the 
functional relevance of this finding, we systematically depleted 
these ribosomal proteins from S2 cells by RNAi and tested 
DCV replication. Knockdown of most ribosomal genes affected 
cell viability or proliferation and did not yield interpretable results 
regarding DCV infection (Figures 1 A and 1 B). Indeed, silencing of 
these genes may result in decreased ability of the cells either to 



support viral replication or to control the infection. By contrast, 
depletion of RACK1 (Figure SI A available online) did not affect 
cell viability or proliferation in S2 cells (Figures IB and 1C) or in 
two other cell lines (Figure SI B). However, it resulted in a signif- 
icant decrease of DCV titer in infected cells (Figure 1 A). Further- 
more, RACK1 silencing did not affect replication of either flock 
house virus (FHV) or vesicular stomatitis virus (VSV) (Figures 2A 
and 2B), indicating that the RACK1 -depleted cells are not only 
viable and able to proliferate but can also support replication 
of other viruses. To test whether the effect of RACK1 was spe- 
cific to DCV or to the family to which it belongs, we infected S2 
cells with Cricket Paralysis Virus (CrPV), another member of 
the Dicistroviridae family. Replication of CrPV was also strongly 
impaired when RACK1 was depleted (Figure 2B). 

We next confirmed these findings in vivo. RACK1 null mutant 
flies are not viable, indicating that RACK1 exerts developmental 
functions (Kadrmas et al., 2007). In agreement with this finding, 
silencing RACK1 expression with a small hairpin RNA (shRNA) 
driven by the broadly active actinSC promoter was embryonic 
lethal. When the thermosensitive Gal80 system was used to ex- 
press the shRNA only in adult flies, development occurred nor- 
mally and the adult flies expressed significantly reduced levels 
of RACK1 at the permissive temperature of 29°C (Figure 2C). 
The reduced levels of RACK1 did not affect the viability of the 
flies, although it reduced longevity by 20% at this temperature. 
In addition, the eggs laid by RACK1 -silenced females showed 
a phenotype similar to that of RACK1 mutants (Figure SIC) 
(Kadrmas et al., 2007). Thus, even though RACK1 is required 
during development, it appears to be largely dispensable in adult 
flies. As expected, when these flies were challenged with DCV, 
both viral RNA and capsid protein levels were markedly reduced 
at 1 and 2 days postinfection compared to controls (Figure 2D). 
Overall, our data indicate that replication of the Dicistroviridae 
DCV and CrPV requires the ribosomal factor RACK1 , which is 
otherwise dispensable for the viability of S2 cells and adult flies. 

RACK1 Is Required for Viral IRES-Dependent 
Translation 

Our data indicate that RACK1 is required for a step of viral repli- 
cation specific to Dicistroviridae. Whereas FHV and VSV use a 
canonical strategy of cap-dependent initiation of translation, 
DCV and CrPV RNA recruits the 40S ribosomal subunit through 
IRES sequences to initiate translation (Figure S2A). Furthermore, 
although initially identified as a scaffolding protein involved in 
protein kinase C signaling, RACK1 is now recognized as a 
component of the 40S subunit of the ribosome. This suggested 
to us that RACK1 was required for viral translation. We first veri- 
fied that RACK1 is indeed required at the ribosome level for CrPV 
replication. We silenced RACK1 expression in a stable cell line 
using an shRNA targeting the 5' UTR (Figure SI D) and observed 
a marked decrease in CrPV replication (Figure SI E). Transfection 
of a vector expressing wild-type RACK1 restored CrPV replica- 
tion in these cells (Figure 3A). By contrast, expression of mutant 
proteins unable to interact with either RpS17 (D108Y) (Kuroha 
et al., 2010) or 18S rRNA (R38D/K40E) (Coyle et al., 2009) did 
not rescue CrPV replication (Figure 3A). We conclude that 
RACK1 is required in the 40S ribosomal subunit for CrPV 
replication. 
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Figure 2. RACK1 Is Required for Replication 
of DCV and CrPV, but Not FHV and VSV 

(A and B) S2 cells were treated with either control 
(GFP) or RACK1 dsRNA for 4 days before chal- 
lenge with DCV, FHV, VSV, or CrPV. Viral infection 
was monitored by immunofluorescence using 
antibodies recognizing capsid proteins (A) and 
qRT-PCR (B) after 1 6 hr or, in the case of VSV, after 
48 hr. The percentage of infected cells is indicated 
for each virus in (A). 

(C) Silencing of RACK1 expression in transgenic 
flies expressing a shRNA targeting the 5' UTR from 
the RACK1 gene using the Gal4-UAS system and 
the broadly expressed actin-Gal4 driver controlled 
by the thermosensitive (TS) tub-Gal80 repressor. A 
shRNA targeting the mCherry protein was used as 
a control. The lifespan of RACK1 -depleted flies is 
shown in the bottom graph. 

(D) RACK1 -sWenced flies infected by DCV after 
5 days at 29° C show a decrease of the viral RNA 
and protein, as indicated by qRT-PCR (top) and 
western blot. 

Data represent the mean and SEM from at least 
three independent experiments, ns, nonsig- 
nificant; dpi, days postinfection; *p < 0.05, **p < 
0.01, and ***p < 0.001. See also Figures S1 
and S2. 
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To confirm that RACK1 is involved in translation from Dicis- 
troviridae RNAs, we tested whether its depletion affected trans- 
lation of luciferase reporters placed under the control of the two 
IRES elements from CrPV (Figures 3B and S2C). Translation of 
a 5' cap-dependent RNA was not affected in the absence of 
RACK1, although it was affected when expression of elF4E 
was knocked down. Translation from the CrPV 5' IRES reporter 
was not reduced, and was even slightly increased, when elF4E 
was silenced, suggesting that the 5' IRES drives noncanonical 
translation. Interestingly, a significant reduction of luciferase 
production was observed for the 5' IRES reporter in RACK1- 
silenced cells (Figure 3B). Silencing of RACK1 did not affect 
the amount of the 5' IRES reporter luciferase mRNA in the cells, 
indicating that RACK1 affects translation, rather than RNA sta- 
bility (Figure S3). By contrast, translation driven by the inter- 
genic (IGR) IRES (Jan and Sarnow, 2002; Spahn et al., 2004) 
was not affected by the level of RACK1 in the cells (Figure 3B). 
Polysome profiles from S2 cells and RACK1 -silenced stable 
derivatives of these cells (Figure S1D) were similar, confirming 
that RACK1 does not affect significantly general translation 
(Figure 3C). Finally, we prepared cell-free translation extracts 
from control and RACK1 -depleted S2 cells and used them to 
monitor translation of in-vitro-transcribed, capped, and IRES- 



dependent RNAs. Translation of the 5' 
IRES reporter RNA was strongly reduced 
in the RACK1 depleted extract. By co- 
ntrast, translation of the 5' CAP and 
IGR IRES-dependent reporters was not 
inhibited and was even slightly stimu- 
lated (Figure 3D). Overall, our data indi- 
cate that ribosomal RACK1 is required 
for IRES-dependent translation of Dicistroviridae both ex vivo 
and in vitro. 

RACK1 Is an Essential Host Factor for HCV Infection 

RACK1 is an evolutionarily strongly conserved factor, and we 
asked whether it plays a role in the translation driven by the 
IRES of a mammalian virus. Hepatitis C virus (HCV), a major 
cause of liver disease and hepatocellular carcinoma, is a positive 
strand RNA virus member of the Flaviviridae family depending on 
a highly structured IRES for its translation (Figure S2) (Spahn 
et al., 2001). Transfection of an siRNA targeting RACK1 markedly 
reduced expression of the protein in Huh7.5.1 cells (Figure 4A), a 
human hepatocyte-derived cell line highly permissive for HCV 
infection (Lindenbach et al., 2005; Wakita et al., 2005). Infection 
of RACK1 -depleted Huh7.5.1 cells by cell-culture-derived HCV 
(Jc1 strain) was strongly and significantly reduced, as revealed 
both by immunodetection of the viral core protein (Figure 4A) 
and the focus-forming assay performed by infection of naive 
Huh7.5.1 cells with supernatants from infected and treated cells 
(Figure 4B). A similar inhibition of infection was observed for HCV 
Luc-Jc1 (Figure 4C), a well-characterized recombinant virus ex- 
pressing a luciferase reporter (Figure S2B). Inhibition of RACK1 
expression was as efficient as the silencing of the key HCV 
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Figure 3. The Ribosomal Protein RACK1 Is 
Required for IRES-Mediated Translation 

(A) Stable S2 transformants expressing a shRNA 
targeting the 5' UTR of RACK1 were transfected 
with vectors expressing three versions of 
RACK1 (WT, D108Y, or R38D/K40E). Expression 
of the transfected RACK1 was monitored by 
western blot using an antibody recognizing the 
N-terminal tag HA. The cells were infected with 
CrPV for 16 hr, and viral RNA loads were 
determined by qRT-PCR. Data represent the 
mean and SEM from three independent experi- 
ments. 

(B) RACK1 is required for translation regulated by 
the 5' IRES, but not the IGR IRES, of CrPV. S2 
cells were treated with dsRNAs corresponding to 
GFP (control), AG02, elF4E, or RACK1 for 3 days 
before transfection of the indicated Luciferase 
reporters (5' CAP, IREScrPv-ioR. oi' IREScrPv-s'; see 
Figure S2). Luciferase activity was monitored 48 hr 
later. The ratio of the activity of the IRES-depen- 
dent luciferase and the 5' cap-dependent lucif- 
erase is plotted and normalized to the control for 
the three reporters. Data represent the mean and 
SD from six independent experiments. 

(C) Polysome profiles from S2 cells expressing or 
not expressing a shRNA targeting the 5' UTR of 
RACK1 . The position of the peaks corresponding 
to the 80S ribosomes and the polysomes are 
indicated. 



(D) In vitro translation of capped and IRES-dependent reporters using cell-free extracts prepared from control or RACK1 -silenced S2 cells. 

Data represent the mean and SD from three independent experiments, ns, nonsignificant; *p < 0.05 and **p < 0.01 . See also Figures S2 and S3. 



host factors CD81 (Koutsoudakis et al., 2007) and Cyclophilin A 
(CypA) (Kaul et al., 2009) (Figures 4A-4C). We next transiently 
depleted RACK1 in Huh7.5.1 cells replicating the reporter virus 
HCV Luc-Jc1 and observed a marked impairment of HCV replica- 
tion (Figure 4D), demonstrating that RACK1 is required for HCV 
translation/replication rather than entry. HCV replication rebound 
observed after day 4 was due to progressive loss of RACK1 
silencing, leading to neosynthesis of RACK1 (Figure S4A). 

To confirm that the inhibition of HCV replication is indeed 
mediated by the effect of RACK1 on IRES-mediated translation, 
we established stable cell lines expressing an IRESncv-lucif- 
erase reporter construct or a classical capped reporter gene 
(Figure S2C) and transfected these cells with RACK1 -specific 
siRNAs. Silencing of RACK1 markedly and specifically de- 
creased IRESncv-clependent translation to a similar extent as 
an antiviral siRNA directed against the IREShcv (Figure 4E). By 
contrast, silencing of ribosomal protein RPS3 inhibited transla- 
tion from both IRES- and 5' cap-dependent reporter constructs 
(Figure 4E). Similar results were obtained when in-vitro-tran- 
scribed reporter mRNAs were transfected into Huh7.5.1 cells, 
ruling out an effect of RACK1 on transcription of the IREShcv 
luciferase reporter gene (Figure 4F). 

Importantly, RACK1 -specific siRNAs did not affect cell prolif- 
eration (Figure S4B) or viability, in contrast to silencing of the ri- 
bosomal protein RPS3 (Figure 4G). A genome-wide microarray 
analysis of polysomes prepared from control or RACK1 -silenced 
human Huh7.5.1 cells revealed that the amount in polysomes of 
mRNAs for most genes, including house-keeping genes and 
important hepatocyte-specific genes such as albumin or lipopro- 



teins, was not affected by RACK1 depletion (Figure 4H). Of note, 
silencing of RACK1 also did not affect the presence of 5' terminal 
oligopyrimidine tract (TOP) mRNAs in polysomes (for details, see 
Supplemental Information). This result suggests that translation 
of the large majority of mRNAs is not affected by the absence 
of RACK1 in human hepatocytes under normal culture condi- 
tions and confirms the results obtained in the model organism 
Drosophila. 

The Effect of RACK1 on Viral Translation Is Independent 
of the mi RNA Pathway 

While this work was in progress, a role for RACK1 in miRNA func- 
tion was reported in the plant Arab/c/ops/s thaliana (Speth et al., 
2013), the model organism Caenorhabditis elegans (Chu et al., 
2014; Jannot et al., 2011), and humans (Otsuka et al., 2011). In 
light of the important impact of the cellular microRNA miR122 
on HCV replication (Jopling et al., 2005), this suggested that 
RACK1 might operate on viral translation through the miRNA 
pathway. We first verified that RACK1 affects the miRNA 
pathway in Drosophila. Expression in S2 cells of two previously 
described miRNA reporters, Par-6 and nerfin-1 (Eulalio et al., 
2007), was derepressed when RACK1 was silenced, indicating 
that, in Drosophila as well, RACK1 is involved in miRNA function 
(Figures 5A and 5B). We note, however, that the derepression is 
much stronger for the miRI reporter than for the miR9b reporter, 
suggesting that the role of RACK1 may be specific of a subset of 
miRNAs. By contrast, silencing of Dcr-7 or AG07 derepressed 
equally well the two miR reporters (Figures 5A and 5B). We 
next tested whether miRNAs play a role in viral replication by 
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Figure 4. RACK1 Is a Specific Host Factor Required for IRES- 
Mediated Translation of HCV 

(A-C) Huh7.5.1 cells were transfected with siRNAs either control (siCTRL) or 
targeting RACK1 (siRACK1-1 or-2), CD81 (siCD81), Cyclophylin A(siCypA), or 
HCV IRES (siHCV) before infection 3 days later with HCV Jcl (A and B) or HCV 
Luc-Jcl (C). Viral infection was monitored 3 days postinfection by immuno- 
blotting using antibodies recognizing HCV core protein (A), by counting foci 
forming units (ffu/ml) (B), or by quantifying luciferase activity (C). 

(D) HCV Luc-Jcl replicating cells were transfected with siCTRL and two 
different siRNAs targeting RACK1 or siCypA, and replication was monitored 
during 5 days by luciferase activity quantification. 

(E) Huh7.5.1. cell lines stably expressing an IRES (IREShcv-Luc) or a 5' cap 
(CTRL-Luc)-dependent luciferase reporter gene were transfected with siCTRL, 
siRACKI, siHCV, siRPS3, or siLuc. Translation was monitored 72 hr later by 
luciferase activity quantification. 

(F) Huh7.5.1 cells were transfected with the indicated siRNAs and 72 hr later 
with in-vitro-transcribed IREShcv or 5' cap-dependent luciferase mRNAs. 
Luciferase activity was monitored 5 hr later. 

(G) Cell viability of Huh7.5.1 cells silenced with the indicated siRNAs was 
measured during 5 days using MTT assay. **p < 0.01 and ***p < 0.005. For (B)- 
(G), data represent the mean ± SD of at least three experiments. 



monitoring accumulation of viral RNAs in cells depleted oWcr-1 
or /AGO 7. Silencing of Dcr-1 had no effect on the viral RNA load 
of the four viruses tested (Figure 5C). Silencing of AG01 did 
reduce, to some extent, CrPV and DCV RNA load. However, 
this reduction was variable in the case of DCV and not to the 
extent of the reduction observed when RACK1 was silenced 
for DCV and CrPV (Figure 5C). Thus, although the miRNA 
pathway may have a contribution in the replication oWicistrovir- 
idae, our data suggest that the strong effect of RACK1 cannot be 
accounted for only by its effect on miRNA function. This was 
confirmed by the observation that silencing of Dcr-1 or AG01 
had no effect on translation driven by the IREScrPv-ss unlike 
silencing of RACK1 (Figure 5D). 

In mammalian hepatocytes, HCV translation depends on 
AG02 and miR122 (Conrad et al., 2013; Roberts et al., 2011). 
As expected, transfection of Huh7.5.1 cells with a miRI 22 mimic 
increased HCV replication, whereas transfection of a miRI 22 in- 
hibitor led to decreased viral replication (Figure 5E). Importantly, 
the impact of the miRI 22 mimic and the miRI 22 inhibitor on HCV 
replication did not depend on RACK1 (Figure 5E). To unambigu- 
ously determine whether the contribution of RACK1 to HCV 
translation was dependent on miRI 22, we used HEK293T cells, 
which do not express miRI 22 (Da Costa et al., 2012; Figure S5). 
Silencing of RACK1 expression efficiently repressed translation 
driven by the IREShcv in these cells (Figure 5F). Finally, transduc- 
tion of HEK293T cells with an expression vector for miRI 22 
did not affect the impact of RACK1 on HCV translation (Fig- 
ure 5G), although miRI 22 was expressed and functional in these 
cells (Figures S5A and S5B). Collectively, these results indicate 
that RACK1 and miRI 22 regulate HCV translation by different 
mechanisms. 

The elF3j Subunit Is Dispensable for Cell Viability but Is 
Important for CrPV and HCV Replication 

We next attempted to gain mechanistic insight on the role of 
RACK1 in viral translation. Previous cryo-electron microscopy 
studies have highlighted the interaction of the 408 subunit with 
the HCV IRES and, in spite of their low resolution, have sug- 
gested that binding of the HCV IRES triggers a pronounced 
conformational change in the small subunit of the ribosome 
(Spahn et al., 2001, 2004). HCV IRES has been also visualized 
on the SOS human ribosome and RACK1 localized in its vicinity 
(Boehringer et al., 2005; Sengupta et al., 2004). The recently 
elucidated crystal structure of the small subunit of the ribosome 
at 3.9A (RabI et al., 2011) allows us to fit the crystal structure 
in the cryo-electron microscopy density. The picture obtained 
suggests that RACK1 is located in close proximity to the IRES 
of HCV in the region affected by the conformational change 
triggered upon IREShcv binding (Figure S6A). By contrast, the 
IREScrPv-iGR> which does not depend on RACK1 (Figure 2C), in- 
teracts with a distinct site of the 40S subunit, directly contacting 



(H) Quantification of representative mRNAs in poiysomes prepared from 
Huh7.5.1 ceiis transfected with siCTRL or siRACKI. Gene expression ieveis, 
shown in arbitrary units, were determined by hybridization on genome-wide 
microarrays and represent the mean ± SD of four individuai sampies. 

Each sampie was anaiyzed individuaiiy. See aiso Figures S2 and S4 and 
Tabie S2. 
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Figure 5. The Effect of RACK1 on Viral Translation Is Independent of 
the miRNA Pathway 

(A and B) RACK1 is required for miR1 and miR9b siiencing. The structure of the 
Par-6 3' UTR and nerfin-1 3' UTR reporter constructs is represented on top, 
and the iuciferase activity in ceiis siienced for the indicated genes is shown 
beiow. 

(C) Effect of the depietion of AG01, Dcr-1 , and RACK1 on repiication in 
Drosophila S2 ceiis of CrPV, DCV, FHV, and VSV. Ceiis were transfected with 
the indicated dsRNAs and infected 4 days iater. Virai RNA was extracted 24 hpi 
and was quantified by qRT-PCR. 

(D) Siiencing of AG01 or Dcr-1 does not affect the activity of a Luciferase 
reporter gene controiied by the iREScrPv-s' in Drosophila S2 ceiis. 

(E) A miR122 mimic and a miR122 inhibitor affect HCV repiication simiiariy in 
controi or RAC/Cy -siienced Huh7.5.1 hepatocytes. 

(F and G) Siiencing of RACK1 affects the activity of the iRESHcv-luciferase 
reporter in miR1 22-deficient (F) and stabiy transfected miR122 expressing (G) 
HEK293T ceiis, respectiveiy. 

Data represent the mean and SEM of at ieast three independent experiments, 
ns, nonsignificant; *p < 0.05, **p < 0.01 , and ***p < 0.001 . See aiso Figure S5. 



RpS25 (Figure S6B) (Fernandez et al., 2014; Koh et al., 2014; 
Schuler et al., 2006; Spahn et al., 2004). Although no direct con- 
tacts between RACK1 and IREShcv could be observed, a recent 
study indicates that a peripheral domain of the translation initia- 



tion factor elF3, which is required for IRESHcvdependent 
translation (Kieft, 2008), is in contact with RACK1 (Figure S6C) 
(Flashem et al., 2013a; Sun et al., 2013). This domain may be 
the functional link between RACK1 and IRESncv-clependent 
translation. 

We asked whether some subunits of elF3, such as elF3c, 
which has been shown to interact with RACK1 in yeast, may 
be specifically involved in IRES-dependent translation, like 
RACK1 . We first tested in Drosophila S2 cells whether some sub- 
units of the elF3 complex are dispensable for cell viability in 
normal culture conditions. Out of the 14 genes encoding elF3 
components (the Drosophila genome contains two elF3g pa- 
ralogues, CG8636 (elF3ga) and CG10881 (elF3gb)), only two 
were not required for cell viability or proliferation (Figure 6A). 
One of these genes is CG10881 , encoding elF3gb, which is ex- 
pressed specifically in testis (Chintapalli et al., 2007) and thus 
provides a useful negative control. The second gene is the 
Drosophila ortholog of elF3J (Figure 6A). We next monitored 
CrPV replication in cells silenced for elF3j or elF3gb 
(CG10881). Although silencing of elF3gb did not affect CrPV 
replication, silencing of elF3j resulted in a significant reduction 
of CrPV replication (Figure 6B). Silencing of elF3j, but not of 
elF3gb, also affected translation of the IREScrPvs'-luciferase re- 
porter, although not as strongly as silencing of RACK1 (Fig- 
ure 6C). In Huh7.5.1 cells, silencing of elF3c affected cell viability. 
By contrast, silencing of elF3j only marginally affected cell 
viability (Figure 6D; Wagner et al., 2014). Interestingly, however, 
it resulted in a moderate but significant decrease of HCV replica- 
tion (Figure 6E). Altogether, these results suggest that the elF3j 
subunit might participate in the observed effects of RACK1 on 
translation. 

DISCUSSION 

A Function for RACK1 in IRES-Dependent Translation 

Cur data reveal a function for RACK1 in specific mRNA transla- 
tion. Indeed, silencing RACK1 expression does not affect 
viability of Drosophila S2 or human Huh7.5.1 cells in tissue cul- 
ture, indicating that formation of active ribosomes is not strictly 
dependent on RACK1 . In vivo as well, translation can occur in 
the absence of RACK1 , as lethality in RACK1 mutant animals 
does not occur before larval stages for Drosophila and gastrula- 
tion in mice (Kadrmas et al., 2007; Volta et al., 2013). In agree- 
ment with this observation, translation of a 5' cap-dependent 
reporter was not affected in the absence of RACK1 in Drosophila 
and human cells. Nevertheless, the fact that RACK1 mutant an- 
imals cannot complete their development suggests that this pro- 
tein is required for the translation of some cellular mRNAs, in 
addition to viral IRES-containing RNAs. Interestingly, previous 
studies have highlighted the role of another protein from the 
40S subunit of the ribosome, RpS25, in IRES-dependent trans- 
lation (Landry et al., 2009). Performed on yeast and mammalian 
tissue-culture cells with IRES reporter assays, these experi- 
ments concluded that RpS25 is essential for the activity of two 
viral IRES, IREShcv and IREScrPv-iGR- The mechanism used by 
RpS25 and RACK1 to promote translation is probably different 
because (1) RpS25 is required for IREScrPv-iGR> unlike RACK1, 
and (2) structural data place RpS25 at a distance from RACK1 
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tion of translation of specific mRNAs (reviewed in Xue and Barna 
[201 2]). They have implications for the development of new anti- 
virals and raise questions on the mechanism underlying the role 
of RACK1 in IRES-dependent translation. 

RACK1 as a Target for Broad Antiviral Intervention 

Our results open interesting therapeutic perspectives for a broad 
range of viral infections, including chronic hepatitis C, a major 
cause of liver cirrhosis and cancer. Because HCV translation ini- 
tiates viral genome neosynthesis via the formation of the replica- 
tion complex, RACK1 -mediated translation is a crucial step in 
virus propagation. Thus, RACK1 is a novel host target for antiviral 
therapy, which is complementary to interferon-based therapies 
or direct-acting antivirals (DAAs). DAAs have achieved high 
response rates with cure in late-stage clinical trials, but high 
costs will limit their broad access. In addition, certain patient 
groups (e.g., genotype 3, renal failure, hepatic decompensation, 
and liver transplantation) will need complementary approaches 
(Chung and Baumert, 2014; Liang and Ghany, 2013). 

The low variability of host factors targeted by host-targeted 
antivirals (HTAs) results in a high genetic barrier to resistance 
(Nathan, 2012). Indeed, HTAs effectively inhibit HCV escape 
variants (Fofana et al., 2010; Lupberger et al., 2011), as well as 
DAA-resistant virus (Xiao et al., 2014a). Furthermore, their com- 
plementary mechanism of action results in synergy with DAAs 
(Xiao et al., 2014b). Given that HTAs interfere with host targets, 
one theoretical caveat is the possibly greater risk of cellular 
toxicity as compared to DAAs. Interestingly, our data obtained 
in cell culture models did not reveal any major toxicity linked to 
RACK1 inhibition. Thus, our proof-of-concept studies in state- 
of-the-art cell culture models open a highly attractive and inno- 
vative perspective to develop small molecules targeting 
RACK1 . RACK1 inhibitors may also be of interest for treatment 
of infection of many other human or animal viruses using 5' 
cap-independent mechanisms for the translation of their RNAs. 



Figure 6. elF3j Is Required for CrPV and HCV Replication, but Not for 
Cell Viability 

(A) Quantification by the MTS assay of the number of viabie ceiis 5 days after 
treatment of S2 ceiis with the indicated dsRNAs. Two different dsRNA prep- 
arations, targeting different regions of the gene, were used for elF3j. 

(B) Quantification by qRT-PCR of CrPV virai RNA ieveis in S2 ceiis treated with 
the indicated dsRNAs. 

(C) Activity of the iREScrPvs' in S2 ceiis siienced for the indicated genes. 

(D) Quantification of Huh7.5.1 ceii viabiiity after siiencing of the indicated 
genes. 

(E) Quantification of HCV repiication in Huh7.5.1 ceiis transfected with the 
indicated siRNAs. 

Data represent the mean and SD of at ieast three experiments. See aiso 
Figure S6. 

on the 40S subunit of the ribosome, providing an explanation for 
its importance on the activity of the IREScrPv-ioR- Several other 
ribosomal proteins (e.g., RpL38 and RpL40) were recently pro- 
posed to be involved in specific translation of some 5' cap- 
dependent mRNAs (Kondrashov et al., 2011; Lee et al., 2013), 
indicating that transcript-specific regulation can occur in the 
absence of IRES elements. Our data lend support to an evolving 
picture of the eukaryotic ribosome, which includes structurally 
peripheral components such as RACK1 involved in the modula- 



Mechanistic Insight on the Role of RACK1 in 
IRES-Dependent Translation 

While this work was in preparation, several reports described a 
role for RACK1 in miRNA function. However, our data in 
Drosophila and human cells indicate that the role of RACK1 in 
IRES-dependent translation does not involve small regulatory 
RNAs. Nevertheless, the connection between RACK1 and AGO 
proteins is intriguing and suggests that RACK1 may participate 
in a checkpoint for the control of the translation of specific 
mRNAs by miRNAs or siRNAs. 

The ribosome code or filter hypothesis posits that some ribo- 
somal proteins have evolved to mediate translation of specific 
mRNAs (Mauro and Edelman, 2002; Topisirovic and Sonenberg, 
2011; Xue and Barna, 2012). A central unresolved issue of this 
hypothesis is the nature of the c/s-acting elements defining a 
possible “ribosome code.” In the case of RACK1 , these c/s-acting 
elements include viral IRES. Interestingly, the IREScrPv-ioR is 
active in the absence of RACK1, unlike the IREScrPv-s' or the 
IREShcv- This IREScrPv-ioR (class I IRES) is capable on its own, 
without any initiation factors, of binding directly the 40S subunit 
and of recruiting the 60S subunit to form an active SOS ribosome, 
thus bypassing the loading of the initiator methionyl-tRNAi 
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(Jan and Sarnow, 2002; Pestova et al., 2004). By contrast, the 
function of IREShcv (class II IRES) requires two canonical elFs, 
elF2 and 3, as well as Met-tRNAi (Kieft, 2008). This suggests 
that the effect of RACK1 on translation initiation may require 
one of these factors. Interestingly, the elF3 complex binds to 
the 40S ribosomal subunit and to the IREShcv (e-O-, Kieft et al., 
2001). Furthermore, RACK1 was shown to associate with one of 
the elF3 subunits in order to assemble a translation preinitiation 
complex in yeast (Flashem et al., 2013a; Kouba et al., 2012). 

Although our understanding of the molecular structure of the 
core of the 1 3 subunits elF3 complex has progressed remarkably 
in recent years (e.g., Hashem et al., 2013b; Sun et al., 2011), the 
role of the noncore subunits remains essentially untested in an- 
imals. Interestingly, the subunit elF3e in the yeast Schizosac- 
charomyces pombe is involved in translation of a selected set 
of RNAs (Sha et al., 2009; Zhou et al., 2005). More recently, 
one of the two elF3h genes present in zebrafish, elF3ha, was 
shown to encode a factor specifically targeting crystalline iso- 
form mRNAs for translation during lens development (Choudhuri 
et al., 201 3). Our data indicate that, like RACK1 , the subunit elF3j 
is not required for cell viability in Drosophila but is required for 
CrPV replication and IREScrPvs'-driven translation. This raises 
the possibility that RACK1 and elF3j act together in translation 
of a specific subset of mRNAs. 

Several observations support a role for elF3j in selective mRNA 
translation. First, it is located in the decoding center of the 40S 
ribosomal subunit, where it can regulate access to the mRNA 
binding cleft (Fraser et al., 2007; 2009). Second, it is located at 
the periphery of the elF3 complex, often in substoichiometric 
quantities, indicating that it can undergo regulated cycles of as- 
sociation and dissociation (Hinnebusch, 2006; Miyamoto et al., 
2005; Sha et al., 2009). Third, experiments in S. pombe and hu- 
man cells indicate that it can be regulated posttranslationally 
by phosphorylation (Sha et al., 2009) or caspase-mediated C-ter- 
minal truncation (Bushell et al., 2000). Altogether, this suggests 
that RACK1 may act as a scaffold recruiting an enzyme modifying 
elF3j in order to allow access of the entry channel of the 40S sub- 
unit to IRES-containing mRNAs. In a way, such a scenario would 
be reminiscent of the recently described role of another elF3 sub- 
unit, elF3e, which controls the recruitment of the kinase Mnki to 
phosphorylate elF4E, thus promoting selective mRNA translation 
in human cells (Walsh and Mohr, 2014). 

EXPERIMENTAL PROCEDURES 

Silencing Candidate Gene Expression by RNAi and Screening 

dsRNAs targeting the candidate genes were designed using the E-RNAi 
aigorithm (http://www.dkfz.de/signaiing/e-rnai3/) (Tabie S1). Knockdown in 
Drosophila S2 ceiis was performed in 96-weii piates using the bathing method, 
and ceiis were chaiienged with virus 4 days iater. Virai ioad was determined by 
qRT-PCR. Aiternativeiy, infected ceiis were fixed and iabeied with anti-capsid 
antibodies for immunofluorescence analysis using the InCELLIOOO Analyzer 
workstation (GE LifeSciences). Image data processing was performed using 
the InCELL Analyzer software. See Extended Experimental Procedures in Sup- 
plemental Information for more details. 

Preparation of Cell-free Extract for In Vitro Translation 

In vitro translation competent extracts were prepared from control or RACK1 - 
silenced S2 cells as described in Wakiyama et al., (2006). Briefly, cells were re- 
suspended in lysis solution (40 mM HEPES-KOH [pH 8], 100 mM potassium 



acetate, 1 mM magnesium acetate, and 1 mM dithiothreitol) at a cell density 
of -^10® ml“’' and were placed in the Cell Disruption Bomb (Parr Instrument 
Company). The homogenate produced upon the pressure release was cleared 
by centrifugations at 4°C, and creatine kinase was added at 0.24 mg.rnl"’' of 
lysate before storage in aliquotes at -80°C. Reporter mRNAs were synthe- 
sized by transcription in vitro using recombinant T7 RNA polymerase. A 
nonfunctional cap (ApppG) (New England Biolabs) was added at the 5' end 
of the IRES monocistronic reporter mRNAs to protect them from degradation. 
Cap-dependent translation was measured with a Renilla Luciferase reporter 
mRNA that was capped with the ScriptCap m7G capping system (Epicenter 
Biotechnologies). In vitro translation was performed as previously described 
(Wakiyama et al., 2006) and under subsaturating conditions to avoid substrate 
titration. 

HCV Infection and Replication Assays 

Huh7.5.1 human hepatoma cells were infected with cell-culture-derived HCV 
(HCVcc strains Jcl and Luc-Jcl , half-maximal tissue culture infectious dose 
UCID50 10^ ml“'' for both viruses]) as described (Lupberger et al., 2011; 
Pietschmann et al., 2006). Two days before infection, gene silencing was per- 
formed by reverse transfection with 10 nM of siRNA (Silencer Select siRNA, 
Ambion) specific for RACK1 , CD81 , Cyclophilin A, HCV IRES, or a nonspecific 
control siRNA. Viral infection and RACK1 depletion were analyzed by western 
blotting and quantified by counting of focus forming units (ffu)/ml following im- 
munostaining using a HCV core-specific antibody (mAbC7-50, Affinity Bio- 
Reagents, CO) or by luciferase reporter gene expression in cell lysates 
3 days postinfection. For HCV replication experiments, Huh7.5.1 cells were 
electroporated with HCV Luc-Jcl RNA (Koutsoudakis et al., 2007). Cells 
were reverse transfected with siRNAs 3 days later. Global analysis of poly- 
some-associated mRNAs in control and RACK-1 silenced Huh7.5.1 cells 
was performed on Agilent “SurePrint G3 Human Gene Expression v2” 8x60k 
microarray (Table S2 and see Extended Experimental Procedures for more 
details). 

ACCESSION NUMBERS 

The microarray data presented in Figure 4H have been deposited in the Gene 
Expression Omnibus database under the accession number GSE60374. 
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figures, and two tables and can be found with this article online at http://dx. 
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SUMMARY 

The HIV-1 Gag protein orchestrates all steps of virion 
genesis, including membrane targeting and RNA 
recruitment into virions. Using crosslinking-immuno- 
precipitation (CLIP) sequencing, we uncover several 
dramatic changes in the RNA-binding properties of 
Gag that occur during virion genesis, coincident 
with membrane binding, multimerization, and pro- 
teolytic maturation. Prior to assembly, and after 
virion assembly and maturation, the nucleocapsid 
domain of Gag preferentially binds to psi and Rev 
Response elements in the viral genome, and GU- 
rich mRNA sequences. However, during virion gene- 
sis, this specificity transiently changes in a manner 
that facilitates genome packaging; nucleocapsid 
binds to many sites on the HIV-1 genome and to 
mRNA sequences with a HIV-1 -like, A-rich nucleotide 
composition. Additionally, we find that the matrix 
domain of Gag binds almost exclusively to specific 
tRNAs in the cytosol, and this association regulates 
Gag binding to cellular membranes. 

INTRODUCTION 

The HIV-1 Gag protein coordinates all major steps in virion as- 
sembly. In so doing, it changes subcellular localization and multi- 
merization state and becomes proteolytically processed (Bell 
and Lever, 2013; Sundquist and Krausslich, 2012). One function 
of Gag is to selectively package a dimeric, unspliced viral RNA 
genome selected from a pool of excess cellular RNAs and 
spliced viral mRNAs (Kuzembayeva et al., 2014; Lu et al., 
2011b; Rein et al., 2011). Genome packaging requires binding 
of the nucleocapsid (NC) domain to viral genomic RNA (Aldovini 
and Young, 1990; Berkowitz et al., 1993; Gorelick et al., 1990). 
Selection of the viral genome is thought to be governed by a 
c/s-acting packaging element, psi (W), within the 5' leader of 
the viral genome, composed of sequences in the unique 5' region 
(U5) and between the tRNA primer binding site (PBS) and the 
5' portion of the Gag open reading frame (ORF) (Aldovini and 
Young, 1990; Clavel and Orenstein, 1990; Lever et al., 1989; 



Luban and Goff, 1994). This element is highly structured (Clever 
et al., 1995; Harrison and Lever, 1992; McBride and Panganiban, 
1 996) and may exist in two conformations that favor translation 
versus dimerization and packaging (Lu et al., 2011a). 

Knowledge of the viral RNA sequences that are directly 
bound by Gag is largely inferred from determinations of func- 
tional packaging signals in genetic studies, complemented by 
limited in vitro data. No assay has yet demonstrated a direct 
interaction between Gag and W in a biologically relevant setting, 
i.e., in cells or virions. Additionally, some findings suggest that 
sequences outside W might facilitate genome packaging. First, 
disruption of W does not eliminate specific RNA encapsida- 
tion (Clever and Parslow, 1997; Laham-Karam and Bacharach, 
2007; McBride and Panganiban, 1997). Second, sequences 
outside W can increase HIV-1 vector titers or virion RNA levels 
(Berkowitz et al., 1995; Chamanian et al., 2013; Das et al., 1997; 
McBride et al., 1997; Richardson et al., 1993). Third, virions 
can package cellular RNAs (Muriaux et al., 2001; Rulli et al., 
2007), lacking a W sequence, particularly in the absence of 
viral RNA. The RNA properties that underlie these findings are 
unknown. 

HIV-1 Gag molecules exist as monomers or low-order multi- 
mers in the cell cytosol and form higher-order multimers only 
after binding to the plasma membrane (Kutluay and Bieniasz, 
2010). Imaging studies indicate that small numbers of Gag mol- 
ecules recruit a single viral RNA dimer to the plasma membrane, 
nucleating the assembly of thousands of Gag molecules into an 
immature virion (Jouvenet et al., 2009). Thereafter, Gag proteol- 
ysis liberates NC and other Gag domains, triggering virion matu- 
ration. Whether these changes in Gag/NC configuration affect its 
RNA-binding properties is unknown, but there is clear potential 
for the RNA binding specificity of Gag to change during virion 
genesis. 

Although NC is thought to be the primary Gag domain that 
binds to viral RNA, the matrix (MA) domain can also bind RNA 
in vitro (Alfadhii et al., 2009; Chukkapalli et al., 2010; Cimarelli 
and Luban, 1 999; Levin et al., 201 0; Ott et al., 2005; Ramalingam 
et al., 2011). N-terminal basic amino acids in MA that mediate 
membrane binding also drive in vitro RNA binding (Chukkapalli 
et al., 2010, 2013; Hill et al., 1996; Saad et al., 2006; Shkriabai 
et al., 2006; Zhou et al., 1994). Because RNA is better able to 
block MA binding to membranes that are devoid of phosphatidy- 
linositol-(4,5)-bisphosphate PI(4,5)P2 (Chukkapalli et al., 2010, 
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201 3; Dick et al., 201 3), RNA might help to target particle assem- 
bly to the plasma membrane. However, RNA binding by MA is 
not thought to be specific, and whether it actually occurs in cells 
is unknown. 

To obtain a complete account of the RNA sequences bound by 
Gag during virion genesis, we employed cross/inking-/mmuno- 
precipitation-sequencing PAR-CLIP and HITS-CLIP techniques 
(Hafner et al., 201 0; Licatalosi et al., 2008). We find that cytosolic 
Gag binds to three sequence elements within the 5' leader of the 
viral genome, which are brought together in a secondary struc- 
ture that defines a minimal W element. We also find that cyto- 
solic Gag binds to additional discrete sites on the viral genome, 
including the Rev Response Element (RRE). Gag association 
with the plasma membrane and its assembly into immature 
virions trigger a profound change in RNA binding specificity 
that favors genome packaging. Subsequently, particle matura- 
tion largely reverses this change. Finally, we find that MA is a 
bona fide RNA binding domain that selects a subset of tRNAs 
in the cystosol, which regulate Gag-membrane binding. Overall, 
these studies provide a dynamic, quantitative, and high-resolu- 
tion account of the global changes in Gag-RNA binding during 
HIV-1 virion genesis. 

RESULTS 

CLIP Assay for HIV-1 Gag-RNA Binding 

We employed recently developed CLIP approaches (Hafner 
et al., 2010; Licatalosi et al., 2008) to identify RNA molecules 
bound by HIV-1 Gag protein during particle genesis. To facilitate 
the purification of Gag-RNA adducts, we generated HIV-1 nl 4 -s 
( subtype B) and HIV-1 ndk (subtype D) proviral clones carrying 
an inactivating mutation in the viral protease and three consecu- 
tive copies of a HA-tag within the stalk region of MA (MA-3xHA/ 
PR“). The MA modification did not affect Gag expression or as- 
sembly and had only a small effect on the infectiousness of a PR^ 
virus (Figures SI A and SIB available online). Cells transfected 
with HIV-1 nl4-3(MA-3xHA/PR“) proviral plasmids were grown 
in the presence of ribonucleoside analogs (4SU or 6SG), which 
also had minimal effects on infectious virion yield (Figures SIC 
and SID). 

Cells and virions were UV- irradiated, lysed, and digested with 
ribonuclease A. Then, Gag-RNA adducts were immunopurified, 
end-labeled with y-^^P-ATP, and visualized after SDS-PAGE 
and transfer to nitrocellulose membranes. Gag-RNA adduct 
formation was dependent on UV irradiation in cells and virions 
(Figure 1A). We primarily used 4SU-based CLIP thereafter, 
because it efficiently generated crosslinks, whereas 6SG and 
unmodified RNA was used for confirmatory purposes. Gag- 
crosslinked RNA oligonucleotides were purified, sequenced, 
and mapped to the HIV-1 and human genomes (see Extended 
Experimental Procedures). Reads derived from the terminal 
repeat (R) region of the HIV-1 genome ambiguously map to 5' 
and 3' ends but are displayed at 5' end of the viral genome, 
and cautiously interpreted, in our analyses. 

In six independent Gag-CLIP libraries prepared from HIV- 
1 nl4-3(MA-3xHA/PR“) or HIV-1 ndk(MA-3xHA/PR“) expressing 
cells, 2.5%-7% of the total reads were HIV-1 derived, whereas 
~60% were from host cell RNA (Figure 1 B; Table SI). In compar- 



ison, RNA sequencing (RNA-seq) libraries, which measure the 
abundance of cellular and viral RNAs, contained 0.3%-1 .6% of 
reads derived from HIV-1, and ~75% were from cellular RNA 
(Figure IB; Table SI). In immature virions, ~50% and 20% of 
CLIP reads were from viral and host RNAs, respectively (Fig- 
ure IB; Table SI), broadly similar to RNA abundance as deter- 
mined by RNA-seq (Figure IB). Thus, viral RNA sequences 
were somewhat selectively bound by Gag in cells but were en- 
riched to a far greater extent in virions. 

HIV-1 RNA Sequences Bound by Gag in Cells and Virions 

We plotted the frequencies with which each nucleotide in the 
HIV-1 NL 4-3 genome was represented in reads from cell- and 
virion-derived HIV-1nl4-3(MA-3xHA/PR“) Gag-CLIP libraries 
(“read density”. Figure 1C). In cells, a major proportion of the 
Gag-linked reads were derived from discrete sites in the viral 
genome. As might be expected, the 5' leader was a frequent 
site of Gag binding. However, additional sites of frequent Gag 
binding included the RRE, sequences overlapping the Nef start 
codon, and untranslated sequences in U3 (Figure 1C). Several 
other sites on the viral genome were bound by Gag, but 
at lower frequencies. The distribution of Gag-crosslinked viral 
RNA reads was highly reproducible in HIV-1nl4-3(MA-3xHA/ 
PR“) Gag CLIP-seq libraries as indicated by the nearly perfect 
correlation between independent experiments (Figure ID). 
However, when a divergent HIV-1 ndk strain was included in 
these analyses, the only prominent Gag binding sites that 
were present in both strains were the 5' leader and RRE 
(Figure 1C). 

The frequency with which Gag was crosslinked to sites across 
the viral genome was starkly different in immature virions. Sites 
of frequent Gag binding on HIV-1 nl 4-3 RNA were far more 
numerous in immature virions than in cells. Moreover, specific 
Gag binding to the 5' leader and RRE was not evident in imma- 
ture virions (Figure 1C). The pattern of Gag binding frequency 
across the viral genome was highly reproducible (Figure ID) 
and was unlikely to be generated by methodological bias during 
CLIP-seq library generation, because it was largely unaffected 
by the choice of ribonucleoside analog (4SU versus 6SG; Figures 
SI E and SI F), ribonuclease (RNase A versus RNase T1 ; Figures 
S1G and S1H), or the immunoprecipitating antibody (anti-HA 
versus anti-NC; Figures SI I and S1J) used to generate the 
CLIP libraries. HIV-1 ndk Gag similarly bound to sites throughout 
the viral genome (Figure 1C). Despite overall similarity, there 
were some clear discrepancies in viral RNA sites that were 
frequently occupied by HIV-1 nl 4-3 and HIV-1 ndk Gag (Figures 
IE and S1K), presumably due to differences in target RNA 
sequence, or subtle differences in the RNA binding specificities 
of the two Gag proteins. 

The 3xHA tag did not affect the pattern of Gag binding at sites 
proximal or distal to the insertion site (Figure S1L). Moreover, 
autocorrelation analysis of Gag binding frequency revealed no 
peaks other than at a separation of s = 0 (Figure SI M). Thus, 
the frequency with which a given site in the viral genome was 
bound by Gag in immature virions appeared unaffected by its 
position relative to the W sequence or other Gag binding sites. 
Rather, Gag binding appeared to be a function of local nucleo- 
tide sequence or structure. 
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Figure 1. Viral RNA Binding by Gag in Cells and Virions 

(A) Gag-RNA adducts immunoprecipitated from HIV-1 NL4-3(MA-3xHA/PR“)-expressing, 4SU- or 6SG-fed, cells or progeny virions, visualized by autoradiography 
(upper). Western blot analysis of the same membranes (lower). 

(B) Proportion of reads that map to cellular and viral genomes obtained from Gag-CLIP and RNA-seq experiments performed on cell lysates and immature virions. 
Each point represents a separate library. 

(C) Frequency distribution of nucleotide occurrence (read density) in reads mapping to viral genomes. A schematic diagram of HIV-1 genome features shown 
above is colinear. 

(D) Correlation analysis of HIV-1 nl 4 -s Gag binding to viral RNA in cells and virions from independent CLIP experiments. 

(E) Correlation analysis of HIV-1 nl 4 -s versus HIV-1 ndk Gag binding to viral RNA in virions. 

See also Figure SI and Table SI . 



Analysis of Gag Binding to 5 ' Leader and RRE 

In cells, the most prominent Gag binding sites on the viral 
genome coincided with the most prominently structured ele- 
ments in the HIV-1 genome, namely, the 5' leader and the RRE 
(Figure 1C). However, Gag was not bound across the 5' leader 
and the RRE with uniform frequency, but selectively associated 
with small determinants within these structures. In the case of 
the 5' leader. Gag was most frequently bound to three distinct 
sequences, including one at the 5' end of U5, a second site be- 
tween the PBS and the major splice donor, and a third site 3' to 
the major splice donor (Figure 2A). These sites are separated 
from each other in linear sequence by approximately 100 nucle- 



otides. Strikingly, however, a nuclear-magnetic-resonance- 
based analysis of the structure of a W sequence (Lu et al., 
201 la) predicts that these three Gag-binding sequences would 
be in close proximity, and partly base paired with each other 
upon RNA folding into a structure that favors genome packaging 
(Figure 2B). 

Frequent Gag binding to the RRE occurred in stem I, a site 
clearly distinct from the primary site of Rev-RRE interaction in 
stem-loop IIB (Malim et al., 1990) (Figures 2C and 2D). Impor- 
tantly, the Gag-bound reads derived from both the 5' leader 
and RRE contained high rates of T-to-C substitution, identifying 
individual nucleotide bases that were in close proximity to Gag 
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Figure 2. Details of HIV-1 Gag Binding to W and RRE 

(A and B) Read density distribution (red) and T-to-C mutations (blue) within the 5' leader (nucleotides 1-400, A) and RRE (B) in Gag-CLIP experiments with 
HIV-1 NL4-3 3i^cl HIV-1 ndk- 

(C and D) Depiction of read density distribution in the context of predicted secondary structures of the 5' leader (nucleotides 1-400, C) and RRE (D) in Gag-CLIP 
experiments with HIV-1 nl 4-3 and HIV-1 ndk- Color intensity represents read density indicated in the key. 

See also Figure S2. 



(Figures 2A and 2C). Additionally, CLIP-seq experiments per- 
formed using 4SU or 6SG yielded a similar footprint of Gag on 
viral RNA, with 5' leader and RRE binding prominently featured 
(Figures S2A and S2B). Thus, the apparently specific binding 
of Gag to these sequence elements was not an artifact of the 
particular crosslinking nucleotide used. 



Changes in Gag RNA Binding to Viral RNA during HIV-1 
Virion Assembly 

Given the stark differences in the interaction of Gag with viral 
RNA in cells and immature virions, we attempted to determine 
what triggers this dramatic change. To this end, cells were frac- 
tionated after UV irradiation (Figure 3A), and Gag-RNA adducts 



Cell 159 , 1 096-1 1 09, November 20, 201 4 ©201 4 Elsevier Inc. 1 099 




Cell 




B 



8 n 

^ <0 
O 2 



64 

51 

39 




I O 

0 ) 

(Q 

i 

z 

> 



E 



® -2 



51 

39 

28 




O 



z 

> 



z 

o 

ii) 

z 

> 



■f 10000' 



I 5000' 



Cytosol 








4clbo 

nudeotkle position 



30000q Immature virions 





1.0 

i 

I 0.5 

I 

I 0.0 

I 

I 

-0.5 L 

1.0 F 





. Cytosol 
; vs. 

Memb. / 


p>0.001 

p>0.01 

p>0.05 


, 


1 



0.5 



^ 0.0 






-0.5 L 



0.5 



-0.5 L 



Memb. p>o.ooi 

vs. p> 0.01 

Virion . 



■ ■ ■■ 



Cytosol p>o.ooi 

vs. p> 0.01 

virion p>°05 



(u r\ n j. ■ - 

Q 0.0 



-400 -200 0 200 400 

Separation (nucleotides) 




-400 -200 0 200 400 

Separation (nucleotides) 



Figure 3. Changes in Gag Binding to Viral RNA during Virion Assembly and Maturation 

(A) Western blot analysis of Gag, and markers of cytosol (LC3A/B) and plasma membrane (Na-K ATPase) in fractionated, 4SU-fed 293T cells transfected with a 
HIV-1 nl4-3(MA-3xHA/PR“) proviral plasmid. 

(B) Autoradiogram of Gag-RNA complexes recovered from fractions in CLIP assays. 

(C) Read density distribution on viral RNA from CLIP experiments in which WT and mutant Gag proteins were immunoprecipitated from cell fractions. 

(D) Correlation analysis of CLIP data from cell fractions and immature virions. 

(E) Autoradiogram of NC-RNA and Gag-RNA complexes recovered from mature and immature virions using an anti-NC antibody. 

(F) Read density distribution on viral RNA from CLIP experiments in which Gag and NC proteins were immunoprecipitated from immature and mature virions using 
an anti-NC antibody. 

(G) Correlation analysis of CLIP data from cells and mature and immature virions. 

See also Figure S3. 



were immunoprecipitated from cytosol, in which Gag is primarily 
monomeric, and from a membrane fraction where multimerized 
assembly intermediates form (Kutluay and Bieniasz, 2010) (Fig- 
ure SB). The RNA binding profile of cytosolic Gag was nearly 
indistinguishable from that described above for total cell lysates, 
with W and RRE representing specific binding sites (Figures 3C 



and 3D). Similarly, a myristoylation-defective Gag mutant 
(Gag-G2A), which does not bind efficiently to membranes, was 
also restricted to discrete binding sites on viral RNA (Figure S3A). 
Conversely, Gag that was immunoprecipitated from membrane 
fractions was bound to many sites on the viral genome, and, 
although binding to W and RRE remained prominent, the overall 
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Figure 4. Gag Binding to Viral RNA in the 
Absence of W 

(A) Western blot analysis (anti-HA) of Gag protein 
expression from WT and W-deieted (A1 05-278 
and A105-278/A301-332) provirai piasmids. 

(B) Proportion of reads from ceii and virion CLiP 
experiments that map to ceii and virai genomes, 
foiiowing transfection of WT or W-deieted provirai 
piasmids. Each point represents a separate 
iibrary. 

(C) Read density distribution on virai RNA from 
CLiP experiments in which Gag proteins were 
immunoprecipitated from ceii iysates and imma- 
ture virions foiiowing transfection with W-deieted 
provirai piasmids. 

(D) Correiation anaiysis of CLiP data from ceiis and 
virions using WT and W-deieted provirai piasmids. 
See aiso Figure S4. 
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RNA binding profile of cell-membrane-associated Gag other- 
wise resembled that of immature virion-associated Gag (Figures 
3C, 3D, and S3B). A late-domain mutant Gag protein (Ap6) was 
similarly bound to sites throughout the viral genome (Figure 3C) 
indicating that the completion of budding was not required for 
this apparent shift in RNA binding specificity. Notably, a Gag 
CAACTD mutant that is fully competent to localize to the plasma 
membrane but is significantly impaired in the formation of high- 
order multimers (Kutluay and Bieniasz, 201 0) exhibited a binding 
profile that was more reminiscent of cytosolic Gag, although 
additional sites in the viral genome were bound with some 
prominence (Figure 3C). Together, these data indicate that 
high-order oligomerization of Gag at the plasma membrane 
drives a profound change in Gag’s RNA-binding properties (Fig- 
ure 3C, 3D, and S3B), dramatically increasing the extent to which 
Gag and viral RNA interact with each other. 

Changes in Gag-Viral RNA Binding Triggered by Virion 
Maturation 

Gag undergoes proteolytic cleavage during virion maturation, 
liberating NC, which is then thought to condense with the viral 
RNA inside a remodeled conical core (Sundquist and Krausslich, 



201 2). To determine whether virion matu- 
ration affects Gag-viral RNA interactions, 
we did CLIP experiments using an anti- 
body that specifically recognizes NC (Fig- 
ures S3D and S3E) and could efficiently 
immunoprecipitate Gag-RNA and NC- 
RNA adducts from immature and mature 
virions, respectively (Figure 3E). Compar- 
ison of RNA sequences bound by Gag 
in immature virions and NC in mature vi- 
rions revealed profound changes accom- 
panying virion maturation (Figures 3F and 
3G). Unlike intact Gag in immature vi- 
rions, NC in mature virions preferentially 
occupied discrete sequences on the viral 
genome, that coincided in large part with 
the major sites of Gag-viral RNA interac- 
tion in the cytosol (i.e., W and RRE) (Figures 3F and 3G). Many 
of the additional prominent NC-binding sites in mature virion 
RNA were also bound by intact Gag in the cytosol at lower fre- 
quencies (Figure S3F). Thus, the RNA-binding properties of NC 
in mature virions resemble that of unassembled Gag in the 
cytosol to a surprising extent, exhibiting statistically significant 
correlation (Figure 3G), possibly reflecting the monomeric state 
of cytosolic Gag and mature NC. 

Gag Binding to Viral RNA and Viral RNA Packaging in the 
Absence of W or RRE 

To test the importance of the Gag/NC-binding sites in W and 
RRE, we generated viruses carrying deletions of these se- 
quences. First, we determined whether the Gag binding to addi- 
tional sites on the viral genome in cells and in virions required 
initial binding to W. To this end, we performed CLIP experiments 
using viral constructs carrying deletions of two of the three 
Gag-binding sequences (A1 05-278), or all three Gag binding se- 
quences (A105-278/A301-332) within W. These deletions left 
sequences surrounding the major 5' splice donor intact, and 
only modestly reduced Gag expression (Figure 4A). Surprisingly, 
in cells, neither the A1 05-278 nor the A105-278/A301-332 
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mutation affected the fraction of Gag-crosslinked reads derived 
from viral versus human genomes (Figure 4B). Moreover, in 
immature virions, these W-deletions caused only a modest ~3- 
fold decrease in the fraction of reads derived from the viral 
genome, with a corresponding increase in the fraction of reads 
from cellular RNAs (Figure 4B). The pattern of Gag binding to viral 
RNA in cells was not greatly affected by the W deletions (Figures 
4C and 4D), although there was an apparent reduction in the 
frequency with which RRE was bound. Similarly, the overall 
pattern of Gag binding to viral RNA in immature virions was 
also relatively unperturbed by W deletions (Figures 4C, 4D, and 
S4A). However, there was a tendency for read densities to 
increase toward the 3' end of the viral genome, perhaps reflect- 
ing the incorporation of spliced viral RNAs into virions generated 
by W-deleted genomes (Clever and Parslow, 1997; Houzet et al., 
2007; Russell et al., 2003). Importantly, these results indicate 
that multiple sites on the viral genome bind to Gag independently 
of W and facilitate HIV-1 genome packaging. 

To analyze the effects of Gag-RRE interaction on HIV-1 infec- 
tivity, we generated viruses in which Rev was deleted and the 
nuclear export of unspliced viral RNAs was mediated by the 
Mason-Pfizer monkey virus constitutive transport element 
(GTE) to circumvent effects of RRE on RNA nuclear export (Fig- 
ure S4B). In this setting, the presence or absence of RRE had no 
discernible effect on Gag protein levels or infectious virus yield 
(Figure S4C). Although it is formally possible that the MPMV 
CTE might recapitulate the Gag-binding properties of RRE, this 
analysis suggests that Gag RRE interactions do not regulate 
Gag expression and are not required for RNA packaging. 

Changes in the Sequence Specificity of Gag RNA 
Binding during Virion Genesis 

The aforementioned results suggest a degree of redundancy, and 
contributions from multiple domains of the viral genome to RNA 
packaging. In an attempt to explain the factors driving the selec- 
tivity with which Gag bound to and packaged viral RNA, we deter- 
mined the identities of cellular mRNA sequences that were most 
frequently bound to Gag in cells and in virions. Reads that aligned 
to the human genome were clustered using PARalyzer (Corcoran 
et al., 2011), which defines a cluster, or binding site, based on the 
occurrence of a minimum number overlapping reads proximal to 
aT-to-C substituted crosslinking site in 4SU-based CLIP assays 
(Table S2). Note that a single cluster is counted once for the 
analysis shown in Figure 5A, irrespective of the number or reads 
associated with it. In cells, >90% of clusters (Gag binding sites) 
were within genes, and ~80% of these were derived from mRNAs 
(Figure 5A) with an overlap of >95% between HIV-1 nl 4 -s and 
HIV-1 ndk Gag-bound mRNA clusters (Figure S5A). A similar 
analysis of immature virions generated a collection of clusters 
that were also mostly derived from mRNAs (Figure 5A) and over- 
lapped by >70% in HIV-1 nl 4 -s and HIV-1 ndk libraries. Notably, 
greater discrepancies were observed when the Gag-binding 
clusters in cells versus virions were compared (Figure S5A), 
indicating that many mRNA sequences that are preferred Gag 
binding sites in cells are not preferred binding sites in virions 
and vice versa. 

We counted the number of reads associated with each cluster 
and determined the 100 sites in cellular mRNAs that were most 



frequently bound by Gag in cells and immature virions. The 
nucleotide composition of these Gag binding sites revealed a 
striking change in the RNA binding preference of Gag during 
virion genesis (Figure 5B). In cells, preferred Gag binding sites 
had a strong tendency to be G-rich (mean G-content of 
~40%). In contrast, in immature virions, preferred Gag binding 
sites displayed a tendency to be A-rich. Remarkably, the nucle- 
otide content of the preferred Gag binding sites in cellular 
mRNAs associated with immature virions was strikingly similar 
to the nucleotide composition of the HIV-1 genome. 

We used cERMIT (Georgiev et al., 2010), to search for 
sequence motifs that were enriched in Gag-bound cellular 
mRNA clusters recovered at various stages of virion genesis. 
G/U-rich sequence motifs were the most often present in host 
mRNA sequences bound to Gag in the cytoplasm (Figure 5C). 
At the plasma membrane, and particularly in immature virions, 
there was a clear change in binding specificity. Here, Gag 
bound more frequently to A-rich and A/G-rich cellular mRNA 
motifs (Figure 5C). Inspection of the sequences of the 12 clus- 
ters most frequently bound by Gag in the viral genome, in imma- 
ture virions, revealed multiple instances of similar A/G-rich 
motifs scattered in the viral genome (Figure 5D). In mature vi- 
rions, RNA binding specificity reverted, and the sequence mo- 
tifs favored by NC were again G/U-rich. These findings reinforce 
the notion that Gag binds preferentially to RNAs that have an 
A-rich nucleotide composition, and particularly to A-rich motifs 
that are present in the viral genome, transiently during immature 
virion assembly. 

tRNAs Are the Most Frequently Gag-Bound RNA in Cells, 
but Not in Virions 

Although the aforementioned analysis focused on Gag binding 
to mRNA sequences, when the number of individual Gag-bound 
reads associated with each cluster or binding site was counted, 
it was evident that mRNAs were responsible for only a fraction 
(~12%) of the cellular RNA reads crosslinked to Gag. In fact, 
tRNAs were the dominant RNA species bound by Gag in cells 
(~60%-70% of reads), with 7SL RNA, a known component of 
HIV-1 virions (Onafuwa-Nuga et al., 2006) constituting the bulk 
of the remaining reads (Figure 6A). Strikingly, the fraction 
tRNA-derived reads was greatly reduced in CLIP-seq libraries 
prepared from virions as compared to cells. This was not the 
case for other RNA classes, suggesting that tRNAs were specif- 
ically excluded from Gag-bound RNAs in virions (Figure 6A). 

MA Binds to Particular tRNAs 

Although the aforementioned Gag-RNA binding events were 
assumed to involve the NC domain of Gag, there are numerous 
reports that MA can interact with viral or cellular RNAs (reviewed 
in Alfadhii and Barklis, 2014). However, these interactions, 
generally assayed in vitro, are of uncertain significance. To test 
whether MA binds to RNA in cells, we modified the aforemen- 
tioned HIV-1 nl4-3(MA-3xH/VPR“) clones to include a FXa prote- 
ase cleavage site immediately N- or C-terminal to the 3xHA-tag 
(Figures 6B and S6A). After UV crosslinking, cell lysates could be 
treated with FXa protease to generate two Gag fragments (MA 
and CA-NC-p6) only one of which would be tagged and immuno- 
precipitated (Figures 6B and 6C). 
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Figure 5. Changes in Gag-RNA Binding Specificity during Virion Genesis 

(A) Classification of PARalyzer-generated read clusters that map to cellular RNAs; total number of clusters is indicated. 

(B) Nucleotide composition of 100 PARalyzer-generated clusters with largest numbers of reads, derived from Gag-bound host mRNAs in cells and immature 
virions. Larger white circle symbols indicate nucleotide composition of the HIV-1 genome. 

(C) Sequence motifs identified by cERMIT as most frequently occurring in Gag bound, host cell mRNA clusters at various steps of virion genesis. Cumulative 
percentages of clusters containing the motifs are indicated. 

(D) Sequences of the 12 viral genome-derived clusters that are most frequently Gag-bound in immature virions. Motifs matching those identified as preferred 
binding sites in host mRNAs are highlighted in blue. 

See also Figure S5 and Table S2. 



Like full-length Gag, the immunoprecipitated CA-NC-p6 con- 
tained crosslinked RNA species that caused adducts to migrate 
~0-1 0 kDa above the expected molecular weight of the protein. 
Notably, the isolated MA domain also generated prominent 
crosslinked species, indicating RNA binding. Unusually, these 
MA-RNA adducts migrated at a discrete molecular weight of 
~35 kDa, suggesting that MA bound to a unique RNA species 
(Figure 6D). 

Although the CA-NC-p6-RNA adducts contained viral RNA se- 
quences with a very similar distribution to those recovered from 



full-length Gag, MA-RNA adducts were nearly devoid of viral 
RNA-derived sequence (Figure 6E). In fact, the RNA molecules 
crosslinked to MA were almost exclusively host tRNAs (Fig- 
ure 6F). When RNase treatment was omitted from the CLIP pro- 
cedure prior to immunoprecipitation, MA-RNA adducts migrated 
at a larger, but still discrete, molecular weight (Figure 6D) match- 
ing the size of one tRNA molecule (~20 kDa) crosslinked to MA, 
suggesting that MA binds to full-length tRNAs. 

Although tRNAs constituted the majority of cellular RNA mole- 
cules bound by intact Gag in cells, deletion of MA caused a major 
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Figure 6. HIV-1 MA Binds to Specific tRNAs, but Not Viral RNA 

(A) Classification of individual reads from cell and immature virion Gag-CLIP experiments that map to genes in the host cell genome. 

(B) Schematic representation of FXa-cleavable Gag proteins. 

(C and D) Western blot analysis (anti-HA, C) and autoradiography (D) of immunoprecipitated (anti-HA) proteins from HIV-1 nl 4-3 MA-FXa-3xHA/PR“- and MA- 
3xHA-FXa/PR“-transfected cells that were subjected to the modified CLIP procedure. Cell lysates were treated with Factor Xa protease, with or without RNase A 
prior to immunoprecipitation. *Breakdown products of the 3xHA-CA-NC-p6 protein that retain the HA tag but are not crosslinked to RNA. 

(E) Read density distribution on viral RNA following CLIP analysis of FXa-liberated 3xHA-CA-NC-p6 and MA-3xHA proteins. 

(F) Classification of individual reads that map to cellular genes from cell associated Gag-CLIP experiments employing the FXa-liberated MA-3xHA protein or WT 
and mutant, uncleaved Gag proteins. Total number of reads is indicated below. 

(G) Numbers of reads crosslinked to FXa-liberated MA-3xHA or uncleaved Gag proteins that lack NC (ANC) and MA (AMA) that map to each tRNA gene. 

(H) Two examples of read density distributions (red) and T-to-C substitutions (blue) at crosslinking sites on tRNAs (Gly[GCC] and Lys[TTT]) following CLIP analysis 
of FXa-liberated MA-3xHA. 

See also Figure S6. 



reduction in the fraction of reads that were tRNA derived but did 
not inhibit binding to mRNAs or 7SL RNA (Figures 6F and S6B). 
Conversely, a mutant Gag protein lacking the NC domain did not 
bind mRNA or 7SL, but bound nearly exclusively to tRNAs (Fig- 



ures 6F and S6B). Progressive and specific loss of tRNA binding 
occurred as more lysines in the N-terminal MA basic domain 
were substituted, and mutation of six lysines (MAK6T) reduced 
Gag-tRNA binding to the same degree as deletion of MA (Figures 
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Figure 7. Reciprocal Relationship between HIV-1 Gag Binding to RNA and Membrane during Virion Genesis 

(A) Western blot analysis (anti-CA and anti-p-tubulin) of HOS cell lysates, fractionated by membrane flotation after treatment with RNase. 

(B) Quantitation of Gag and p-tubulin in the cytosol and membrane fractions from five independent experiments each with HOS and HeLa cells. 

(C) Classification of individual reads that map to genes in the host cell genome from Gag-CLIP experiments done using fractionated cells and WT or mutant Gag 
proteins as described in Figure 3. Total number of reads is indicated below. 

(D) Model summarizing changing interactions between Gag and RNA during virion genesis. 



6F and S6B). As expected, MAK4T and MAK6T mutations 
caused decreases in the levels of released particles (Figure S6B), 
consistent with a plasma membrane targeting defect in these 
mutants. 

MA binding to tRNAs was highly selective. Indeed, GluCTC, 
GluTTC, GlyGCC, GlyCCC, LysCTT, LysTTT, ValAAC, and 
ValCAC tRNAs were bound up to 100-fold more frequently 
than the majority of tRNAs (Figure 6G). No such enrichment 
occurred in CLIP experiments done with GagAMA. Although 
intact tRNA molecules were bound by MA (Figure 6D), binding 
apparently involved the 5' half of the tRNA molecules, and espe- 
cially the dihydrouridine loop, as indicated by the very high rates 
of T-to-C conversions at this site (Figure 6H). Binding was not 
induced by 4SU incorporation into tRNAs, because the same 
tRNAs were selectively MA-bound in CLIP experiments where 
crosslinking was induced by UV irradiation at 254 nm in the 
absence of modified nucleotides (Figure S6C). 

Regulation of Gag Membrane Binding by tRNA 

The finding that the same lysine residues that mediate plasma 
membrane binding also mediate tRNA binding, suggested that 
tRNAs might regulate Gag localization. Indeed, previous work 
has indicated that exogenous RNA can inhibit Gag binding to 
liposomes in vitro (Chukkapalli et al., 2010, 2013; Dick et al., 
2013). We took a simple cell-based approach (without addition 
of exogenous RNA or liposomes) and tested whether RNase 



treatment of cell lysates increased Gag binding to endogenous 
cellular membranes. Lysates of HeLa and HOS cells stably 
expressing Gag-CFP were treated with ribonucleases and sub- 
jected to membrane flotation analysis. Strikingly, RNase treat- 
ment caused a significant redistribution of Gag from cytosol to 
membrane fractions (Figures 7A and 7B), whereas a control pro- 
tein, p-tubulin, was largely unaffected. Consistent with a model 
in which tRNAs compete with membranes for Gag binding, 
tRNAs comprised a significantly smaller fraction of Gag-bound 
RNAs at the plasma membrane than in the cytoplasm (Figure 7C). 
This difference was more pronounced when particle budding 
was blocked at the plasma membrane by deletion of the late 
budding domain of Gag (Figure 7C). Thus, these results indicate 
that MA can bind to tRNA or cell membranes, but not both 
simultaneously, and strongly suggest that tRNAs regulate Gag 
localization by binding to basic amino acids in MA. 

DISCUSSION 

Two central conclusions of these studies are (1) the HIV-1 Gag 
protein has two RNA binding domains (NC and MA) with very 
different specificities and (2) dramatic changes in RNA binding 
regulate Gag localization and genome packaging during virion 
genesis (Figure 7D). 

Prior to virion assembly. Gag exists as a diffuse pool of mono- 
mers or low-order multimers in the cytoplasm with its NC domain 
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bound primarily to mRNA, with some binding to 7SL and tRNA 
(Figure 7D). A fraction of NC-RNA interactions are with particular 
sites on the HIV-1 genome, including the W sequence. Our data 
reveal the specific RNA sequences within W that are in proximity 
to Gag, in a physiological setting. Satisfyingly, the three noncon- 
tiguous RNA elements that are most frequently crosslinked to 
Gag (nucleotides ~1 00-1 26, 195-260, and 300-350) coincide 
nearly precisely with a minimal W element, which adopts a sec- 
ondary structure that putatively favors genome packaging (Lu 
et al., 2011a). 

Surprisingly, cytoplasmic Gag bound to additional discrete 
elements on the viral RNA, including RRE. Previously, env se- 
quences, including RRE, were shown to facilitate packaging 
(Kaye et al., 1995; Richardson et al., 1993), but a discrete pack- 
aging sequence within env could not be mapped. Other reports 
suggest that Rev enhances packaging (Brandt et al., 2007), 
although effects of Rev/RRE on viral RNA nuclear export are 
potential confounders in packaging experiments. Despite strong 
evidence for Gag-RRE binding, the RRE did not affect infectious 
virion yield when its nuclear export function was replaced, sug- 
gesting that Gag-RRE binding is not required for genome pack- 
aging. Although it is possible that Gag/NC-RRE interaction plays 
a redundant role in packaging, other plausible functions include 
(1) shielding double-stranded stem I RNA from cytoplasmic sen- 
sors, (2) coupling RNA-export with packaging, (3) displacement 
of Rev from the RRE for recycling, (4) regulation of Env transla- 
tion, and (5) enhancement of reverse transcription via NC’s 
unwinding/chaperone function (Levin et al., 2010). Further work 
will be required to elucidate the functional role, if any, of Gag/ 
NC-RRE binding. 

In the cytosol. Gag favored binding to the viral genome over 
cellular mRNAs by a few fold. This level of discrimination is insuf- 
ficient to account for the selectivity with which viral genomes are 
packaged. Our data indicate that a dramatic change in Gag-RNA 
binding specificity, coincident with CA-CTD-dependent high- 
order multimerization at the plasma membrane, contributes to 
selective packaging (Figure 7D). GU-rich sequences in cellular 
mRNA were targeted by Gag in the cytosol, consistent with pre- 
vious in vitro and structural studies indicating that the isolated, 
monomeric NC domain favors binding to such sequences 
(Berglund et al., 1997; De Guzman et al., 1998; Fisher et al., 
1998). However, during assembly. Gag molecules become 
tightly packed in hexameric lattices (Briggs et al., 2009; Wright 
et al., 2007). By constraining thousands of NC domains into a 
pseudo-2D curved array, local NC concentration is dramatically 
elevated. Potentially, features of NC that govern RNA binding 
specificity (zinc knuckles and basic amino acids) might be differ- 
ently accessible in an assembled Gag lattice. Under these 
conditions, we found that A-rich mRNA sequences were prefer- 
entially bound by Gag (Figure 7D). Remarkably, the nucleotide 
composition of mRNA sequences bound by assembled Gag re- 
flects an unusual, heretofore unexplained, property of the HIV-1 
genome. Thus, our findings suggest that a need to selectively 
package viral RNA caused HIV-1 to evolve an unusually A-rich 
genome. Conversely, an A-rich genome may have evolved for 
other reasons and then drove Gag to gain a unique oligomeriza- 
tion-driven specificity for A-rich RNA. Notably, W-deletion 
caused only a 3-fold reduction in the fraction of virion-associ- 



ated, Gag-bound, RNA sequences that were viral RNA derived. 
Moreover, the pattern of Gag binding to many sites in the viral 
genome was not solely a secondary effect of physical proximity 
to Gag-W interactions. Rather, it appears that HIV-1 genome 
packaging is a two-step process, involving interactions between 
(1 ) W and monomeric Gag and (2) A-rich viral RNA and multimeric 
Gag. This scenario should selectively drive particle assembly 
on viral RNAs, and we speculate that the biases in nucleotide 
composition exhibited by HIV-1 and other retroviruses serves 
as a proofreading-like mechanism to enhance the fidelity of 
genome packaging following initial Gag-W interaction. 

An unexpected finding was that proteolytic cleavage of Gag 
caused NC to revert to a preference for GU-rich mRNAs and 
discrete viral RNA sequences. This result reinforces the notion 
that Gag/NC RNA binding specificity is multimerization depen- 
dent. By liberating the majority of viral RNA from NC, while 
maintaining interaction with structured elements (to enable 
NC’s chaperone activity), maturation-dependent changes in 
Gag/NC-RNA binding could facilitate reverse transcription (Levin 
et al., 2010). 

Another surprising finding was that MA binding to specific 
tRNAs constitutes the most frequent binding events between 
Gag and RNA in cells. MA-tRNA binding was independent of 
NC and the PBS and is thus unlikely to involve the tRNA primer 
annealed to viral genome. Rather, we found that MA-tRNA inter- 
action could regulate the binding of Gag to cellular membranes. 
MA specifies the location of virion assembly, and it was previ- 
ously shown that RNA can block in vitro MA binding specifically 
to liposomes that lack acidic phospholipids (Chukkapalli et al., 
2010, 2013). Thus, occlusion of MA basic residues by specific 
tRNAs might inhibit nonproductive assembly at most intracellular 
membranes and facilitate targeting to the plasma membrane 
where resident lipids have a high affinity for MA. Alternatively, 
MA-tRNA binding might provide a mechanism by which virion 
assembly is temporally regulated. 

MA-tRNA interactions could serve additional purposes. Some 
degree of RNA binding may be an inevitable consequence of en- 
coding a highly basic domain. Thus, specific MA binding in a 1 :1 
complex to small RNAs might be a mechanism to avoid the 
aggregation of a protein that has both two distinct RNA binding 
domains and a tendency to multimerize. MA-tRNA binding might 
also prevent nonproductive interaction of a viral genome with 
a Gag monomer whose NC domain has engaged viral RNA. 
Conceivably, MA may facilitate the selection of the RT primer, 
as tRNAIys3 is among the tRNAs bound by MA, but several other 
tRNAs are also bound by MA more frequently. Finally, MA-tRNA 
interaction could regulate viral and/or host translation. The A- 
richness of the HIV-1 genome results in suboptimal codon usage 
(Grantham and Perrin, 1986; Kypr and Mrazek, 1987; Sharp, 
1986) and an elevated number of lie, Lys, Glu, and Val codons 
in the Gag and Pol ORFs (Berkhout and van Hemert, 1994). 
Notably, Lys, Glu, and Val tRNAs are among those specifically 
bound by MA, providing a potential opportunity for Gag to regu- 
late its own translation as it accumulates to high levels and 
sequesters tRNAs (perhaps facilitating packaging as a conse- 
quence). Similarly, MA could inhibit translation of host mRNAs 
whose products may be deleterious for viral replication. Indeed, 
interaction of MA with host elongation factors via a tRNA bridge 
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has been reported to inhibit translation in vitro (Cimarelli and 
Luban, 1999). 

Overall, our global survey reveals surprising ways in which the 
interaction between Gag, viral, and host RNAs can change and 
modulate the process of virion genesis and genome packaging. 

EXPERIMENTAL PROCEDURES 

Proviral Plasmids and Cells 

HIV-1 NL 4 - 3 -derived proviral plasmids containing a SxHAtag in the staik region 
of matrix (HiV-1 nl4-3MA-3xHA) were constructed using overiap extension PGR. 
Various derivatives of this construct encoding a cataiyticaiiy inactive virai 
protease (PR“), a Factor Xa cieavage site on either side of the HA tag (MA- 
FXa-3xHA and MA-3xHA-FXa), deietions of nucieocapsid domain (ANC), the 
giobuiar head of MA (AMA), the CA CTD (CA ACTD), or the W signai (A105- 
278 and A1 05-278/ A301 -332) were constructed using PCR-based deietion 
mutagenesis. Constructs carrying mutations at binding sites for TsglOl and 
ALiX proteins in the p6 domain of Gag (Ap6), the Gag myristoyiation signai 
(G2A), or at iysine residues in MA (MAK26,27T MAK4T and MAK6T) were con- 
structed using PGR overiap extension-based mutagenesis. Provirai piasmids 
with deietions or mutations in Env, Rev, and RRE and encoding a Mason-Pfizer 
monkey virus constitutive transport eiement were simiiariy constructed. Detaiis 
of the construction are described in Extended Experimentai Procedures. 

PAR-CLIP, HITS-CLIP, and RNA-Seq 

For CLiP experiments, HEK293T ceiis were grown in 10 cm dishes and trans- 
fected with provirai plasmids using poiyethyienimine (PoiySciences). Virions 
were harvested from fiitered supernatant by uitracentrifugation through su- 
crose and UV irradiated, whereas ceiis were irradiated whiie adhered to cuiture 
dishes. Prior to UV crossiinking, a fraction of ceiis and virions were coiiected 
for RNA-seq anaiysis. After crossiinking, the CLiP procedure was performed 
on unfractionated iysates (after removai of nuciei) or on membrane and cyto- 
piasmic fractions. 

For the CLiP procedure, ceii and virion iysates were treated with RNaseA or 
RNaseTI and then incubated with Protein G-conjugated Dynabeads coated 
with anti-HA or anti-NC antibodies. After immunoprecipitation of RNA-protein 
adducts, beads were washed and treated sequentiaiiy with caif intestinai aika- 
iine phosphatase and then poiynucieotide kinase and y-^^P-ATP. RNA-protein 
adducts were eiuted from the beads, separated by SDS-PAGE, biotted onto 
nitroceiiuiose, and digested with proteinase K. RNA oiigonucieotides were 
then iigated to adapters, ampiified by PGR and sequences determined using 
a iiiumina HiSeq 2000 piatform. 

CLiP-seq experiments were performed two to six times on ceiis and virions. 
Further detaiis of the method are in the Extended Experimentai Procedures. 

Bioinformatic Analyses 

The FASTX tooikit was used to process the reads from CLiP and RNA-seq 
iibraries before mapping. Reads were separated based on their 5' barcode 
sequences and coiiapsed to generate a set of unique sequences. Unique 
CLiP-seq and RNA-seq reads were mapped to the human (hg19) and HiV-1 
genomes using the Bowtie. SAMtoois, in house scripts, and GraphPad Prism 
were used to caiculate and dispiay read densities associated with virai and 
ceiiuiar RNAs and to caicuiate correiations. Ciusters were generated using 
PARaiyzer and used as an input for the cERMiT motif finding tooi. Further 
detaiis are given in the Extended Experimentai Procedures. 

Membrane Flotation Assays 

The membrane flotation assays were performed using HeLa and HOS ceiis 
stabiy expressing cyan fluorescent protein-tagged Gag proteins as described 
(Kutiuay and Bieniasz, 2010) with modifications outiined in the Extended 
Experimentai Procedures. 

ACCESSION NUMBERS 

CLiP-seq data were deposited to the GEO database with accession number 
GSE61508. 
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SUMMARY 

IncRNAs are known to regulate a number of different 
developmental and tumorigenic processes. Here, we 
report a role for IncRNA BCAR4 in breast cancer 
metastasis that is mediated by chemokine-induced 
binding of BCAR4 to two transcription factors with 
extended regulatory consequences. BCAR4 binding 
of SNIP1 and PNUTS in response to CCL21 releases 
the SNIPVs inhibition of p300-dependent histone 
acetylation, which in turn enables the BCAR4-jre- 
cruited PNUTS to bind H3K1 Sac and relieve inhibition 
of RNA Pol II via activation of the PP1 phosphatase. 
This mechanism activates a noncanonical Hedge- 
hog/GLI2 transcriptional program that promotes 
cell migration. BCAR4 expression correlates with 
advanced breast cancers, and therapeutic delivery 
of locked nucleic acids (LNAs) targeting BCAR4 
strongly suppresses breast cancer metastasis in 
mouse models. The findings reveal a disease-rele- 
vant IncRNA mechanism consisting of both direct co- 
ordinated protein recruitment and indirect regulation 
of transcription factors. 

INTRODUCTION 

Emerging evidence has purported long noncoding RNA (IncRNA) 
as a new class of players involved in the development and pro- 
gression of cancer (Fatica and Bozzoni, 201 4). However, the reg- 
ulatory roles played by IncRNAs in breast-cancer-associated 
aberrant signaling pathways/transcriptional programs are not 
completely understood. IncRNAs exert their regulatory functions 
through specific interactions with proteins, including via epige- 
netic modifiers, transcriptional factors/coactivators, and RNP 
complexes (Rinn and Chang, 201 2). The specific IncRNA-protein 
interactions could be mediated by canonical RNA-binding 
domains (RBDs) (Lunde et al., 2007) or by noncanonical 
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RBDs, including tryptophan-aspartic acid 40 (WD40) domain 
and RNA-binding domain abundant in Apicomplexans (RAP), 
demonstrated by recent mRNA interactome capture methodol- 
ogy (Castello et al., 2012). Therefore, it is of great interest to un- 
cover new functions of IncRNAs by dissecting IncRNA-protein 
interactions mediated by noncanonical RBDs in certain biolog- 
ical processes. 

The aberrant activation of the hedgehog signaling pathway in 
breast cancer has been connected with increased expression of 
the transcription factor, glioma-associated oncogene homolog 
1/2 (GLI1/2) (ten Haaf et al., 2009). GLI1/2-dependent target 
gene transcription has been shown to be involved in tumor cell 
growth and metastasis in solid tumors (Rubin and de Sauvage, 
2006). However, GLI-target transcription might be activated 
in the absence of the hedgehog ligand sonic hedgehog (SHH), 
especially in triple-negative breast cancer (TNBC) (Hui et al., 
2013), suggesting that other mechanisms/regulators may regu- 
late the activity of the GLI transcription factor. The direct binding 
of IncRNAs to transcription factors (Geisler and Collar, 2013) 
led us to speculate that the association of transcription factor 
GLI with IncRNAs may function in regulating GLI-dependent 
transcriptional program essential for breast cancer progression 
and metastasis. 

The IncRNAs implicated in breast cancer represent a prom- 
ising class of therapeutic targets. Targeting noncoding RNAs 
by using locked nucleic acid (LNA)-based antisense oligonucle- 
otides strategy has been a longstanding interest (Dias and Stein, 
2002), with several successful applications in targeting miRNAs 
in cancer (Ling et al., 2013). However, therapeutic targeting 
of IncRNA has not been well documented for breast cancer. 
Thus, we aimed to determine the therapeutic potential of target- 
ing breast cancer-upregulated IncRNAs by a LNA-based anti- 
sense oligonucleotides strategy. 

Here, we report the identification of a signaling pathway that is 
triggered by CCL21 and mediated by citron (rho-interacting, 
serine/threonine kinase 21) (CIT) kinase to phosphorylate 
the transcriptional factor GLI2, which regulates target gene 
expression in breast cancer cells. The IncRNA BCAR4 is required 
for phospho-GLI2-dependent gene activation via its direct 
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interaction with Smad nuclear-interacting protein 1 (SNIP1) 
and Serine/threonine-protein phosphatase 1 regulatory subunit 
10 (PPP1R10, also known as PNUTS). Mechanistically, the 
BCAR4-SN\P^ binding releases the inhibitory role of SNIP1 on 
p300 histone acetyltransferase (HAT) activity, leading to the 
acetylation of histones, including a novel mark, H3K18ac, on 
the promoters of GLI2 target transcription units. The acetylated 
H3K18 can be further recognized by PNUTS, which is recruited 
to the promoters of GLI2 target genes by BCAR4, to attenuate 
the protein’s inhibitory effect on the enzymatic activity of PP1 , 
leading to hypophosphorylation of RNA polymerase II at Ser5. 
Elevated BCAR4 expression is correlated with higher metastatic 
potential and shorter survival time of breast cancer patients, 
whereas its therapeutic inhibition by LNA displays in vivo efficacy 
against metastasis. Our findings have provided supporting evi- 
dence for the regulatory roles played by IncRNAs in the progres- 
sion of aggressive breast cancers. Broadly, our results of the 
therapeutic effectiveness of BCAR4 LNA against breast cancer 
metastasis document an example showing the pharmacologic 
value of IncRNA in human cancer and other diseases. 

RESULTS 

BCAR4 Correlates with Advanced Breast Cancer and 
Regulates GLI-Mediated Transcription 

To identify breast cancer-relevant IncRNAs, we profiled the 
expression of IncRNAs in two stage III breast cancer tissues 
and their paired adjacent noncancerous tissues (Figure SI A 
available online) by LncRNA Array (v. 3.0) (ArrayStar). An average 
of 1 ,381 upregulated IncRNAs (range from 1 ,034 to 1 ,729) and 
1 ,458 downregulated IncRNAs (range 1 ,408-1 ,508) with signifi- 
cantly differential expression (> 3.0-fold) were identified (Fig- 
ure 1 A; Table SI). We further compared the IncRNA expression 
levels between breast cancer tissues and their paired adjacent 
normal tissues based on the NCBI RefSeq database (which 
contains 3,991 human IncRNAs with annotated NR accession 
number), identifying 65 and 116 upregulated IncRNAs in two 
patient cases, respectively (>4. 0-fold) (Figure IB). Among 
these IncRNAs, 21 were consistently upregulated in both patient 
samples, of which BCAR4, initially identified through genetic 
screening as a novel gene involved in tamoxifen resistance in 
breast cancers (Meijer et al., 2006), showed the most upregula- 
tion (logFC: 15.9 and 16.1, respectively) (Figures SIB and SIC). 

We first performed RNA in situ hybridization on breast cancer 
tissue microarrays (clinicopathological features listed in Table 
S2) using RNAScope 2.0 HD technology to examine the potential 
correlation of BCAR4 with breast cancer. In a training set of 
breast cancer tissue microarrays containing 232 cases, 
BCAR4 exhibited positive staining only in 10% of the normal 
breast tissues, whereas 54.10% of breast cancer tissues 
showed positive BCAR4 expression (p = 0.0057) (Figure 1C). In 
a validation set containing 170 cases, none of ten normal adja- 
cent breast tissues showed detectable BCAR4 expression, but 
61 .88% of breast cancer tissues exhibited positive BCAR4 stain- 
ing (p = 0.0011) (Figure 1C). Furthermore, breast cancer at 
advanced lymph node metastasis stage (TnN > OM > 0) showed 
increased BCAR4 expression compared to those at early-stage 
tumor with no lymph node metastasis (TnNOMO) (p = 0.0001, 



training set; p = 0.0035, validation set) (Figure ID). Elevated 
BCAR4 expression also significantly correlates with shorter 
survival time of breast cancer patients (n = 160; p = 0.0145) 
(Figure IE). We further analyzed breast cancer database in 
Oncomine, finding that BCAR4 expression not only correlates 
with breast cancer but also with triple negativity lymph node 
metastasis and 5 years recurrence (Figure SID). Oncomine 
database also showed significant correlation of BCAR4 expres- 
sion with metastatic prostate cancer, lung cancer, colorectal 
cancer, and rectal cancer (Figure SI D). To confirm this, we em- 
ployed the RNAScope Assay to analyze BCAR4 expression in 
normal and cancer tissues from multiple organ, observing 
increased BCAR4 expression in many types of human cancer 
tissues, including colorectal, melanoma, and lung cancer, 
compared to normal tissues (Figure IF; Table S3). Taken 
together, these results demonstrated the strong correlation of 
BCAR4 expression with breast cancer progression and the rele- 
vance of elevated BCAR4 expression to human cancer develop- 
ment and progression. 

We then examined the expression of BCAR4 in a panel of 
breast cancer cell lines, finding higher expression of BCAR4 in 
mesenchymal-like cell lines with metastasis potential compared 
to epithelial-like cell lines, which are considered as nonmeta- 
static (Figure 1 G). Next, we examined the subcellular localization 
of BCAR4 by RNA fluorescence in situ hybridization (FISH) and 
qRT-PCR analyses on fractionated RNA, finding that the 
BCAR4 transcript is predominately localized in the nucleus (Fig- 
ures 1H and S1E). BCAR4 has two major splice variants, full- 
length transcript (~1 .3 kb) and an isoform lacking two alternate 
exons (~680 bp), and our northern blot analysis revealed that 
the full-length isoform was predominately expressed in MDA- 
MB-231 cells, but truncated isoform barely expressed (Fig- 
ure SI F). Because the previous report suggested that BCAR4 
may encode a small peptide in bovine oocytes (Thelie et al., 
2007), we generated an antibody using the predicted translated 
peptide sequence. However, neither immunoblotting of MDA- 
MB-231 lysate nor in vitro translation assays showed protein 
coding potential of BCAR4 (Figure S1G; data not shown). 

We next analyzed the effect of BCAR4 knockdown on activa- 
tion of key signaling pathways in breast cancer cells using the 
Cignal Finder 45-Pathway Reporter Array, finding that either 
small interfering RNA (siRNA) or LNA efficiently depleted 
BCAR4 expression (Figures S1H and S1I) and knockdown of 
BCAR4 dramatically inhibited GLI reporter luciferase activity 
but no other transcription factor reporters (Figure 1 1). qRT-PCR 
analysis confirmed decreased expression of endogenous GLI 
target genes with BCAR4 knockdown (Figure 1J). These data 
suggest the potential role of BCAR4 in mediating the GLI-depen- 
dent hedgehog signaling pathway in breast cancer cells. 

Identification and Biochemical Characterization of 
SCAI74- Associated Proteins 

Through RNA pull-down, followed by mass spectrometry (MS) 
analysis, we identified that in vitro-transcribed biotinylated 
BCAR4 sense transcript associated specifically with CIT kinase, 
GLI2, SNIP1 , and PNUTS, even under high stringency wash con- 
ditions. However, the antisense transcript of BCAR4 associated 
with some general RNA-binding proteins that were also bound 
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Figure 1 . BCAR4 Correlates with Breast Cancer Metastasis 

(A) Scatter plots of IncRNAs significantly upregulated (red) or downregulated (green) in two pairs of TNBC tissues compared to the matched adjacent normal 
tissues (NBT). x and y axes, averaged normalized signal values (log 2 scaled); green lines, fold changes = 4. 

(B) Commonly upregulated IncRNAs in two pairs of TNBC compared to NBT. 

(C) RNAScope detection oi BCAR4 expression in human breast cancer and adjacent normal tissues. Left panel: representative images; right panel: statistical 
analysis of training set (10 normal tissues versus 222 cancer tissues) and validation set (10 normal tissues versus 160 cancer tissues). 

(D) RNAScope detection of BCAR4 expression in nonmetastasis (TnNOMO) versus metastasis (TnN > OM > 0) breast cancer tissue. Left panel: representative 
images; right panel: statistical analysis of training set (167 nonmetastasis versus 55 Metastasis) and validation set (66 nonmetastasis versus 94 metastasis). 

(E) Kaplan-Meier survival analysis oi BCAR4 expression in breast cancer patients (n = 160). 

(F) RNAscope detection of BCAR4 expression in multiple human tissues. 

(legend continued on next page) 
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by the beads (Figures 2A and S2A; Table S4). Of note, in one of 
two biological repeats of RNA pull-down experiment, we 
observed the relative abundant association of BCAR4 with 
heterogeneous nuclear ribonuclearprotein, which have been re- 
ported to bind other IncRNAs (Carpenter et al., 2013; Huarte 
et al., 2010). Furthermore, the MS data indicated the potential 
phosphorylation of GLI2 at Serine149 (Figure S2B). 

The RNA pull-down assays with cell lysate further confirmed 
the specific association of BCAR4 with the proteins identified 
by MS analysis (Figure 2B). In vitro RNA-protein binding assay 
revealed that only PNUTS and SNIP1 directly interact with 
BCAR4 (Figures 2C and S2C). Protein domain mapping studies 
demonstrated that BCAR4 binds the 97-274 amino acid (aa) re- 
gion of SNIP1 and 674-750 aa region of PNUTS, respectively 
(Figures 2D and 2E). The 97-274 aa region of SNIP1 encodes a 
domain known as the domain of unknown function (DUF) and 
has been suggested to bind mlRNA (Yu et al., 2008), an observa- 
tion that is consistent with our observation that the DUF of SNIP1 
serves as the RNA-binding domain for BCAR4. PNUTS also has 
an RNA-binding motif, the 674-750 aa region known as RGG- 
box (Kim et al., 2003). To further understand the BCAR4- protein 
interactions in vivo, we performed immunoprecipitation using 
antibodies against CIT, GLI2, SNIP1, and PNUTS, respectively, 
under the condition of BCAR4 knockdown (Figures S2D and 
S2E), finding that knockdown oi BCAR4 impaired the interaction 
of PNUTS with proteins CIT, GLI2, and SNIP1, but had minimal 
effect on the association of CIT, GLI2, and SNIP1 with each other 
(Figure S2E). Given the observation that only SNIP1 and PNUTS 
directly bound to BCAR4 (see Figure 2C), our data suggest that 
SNIP1 mediates the association of CIT and GLI2 with BCAR4 
and that SNIP1 and PNUTS bind distinct regions of BCAR4. 

To map the BCAR4 sequence motifs responsible for SNIP1 
and PNUTS binding, we performed an in vitro RNA pull- 
down followed by dot blot assay (Yang et al., 2013). The 
motif sequence of BCAR4 bound/protected by SNIP1 and 
PNUTS was identified to encompass ^^®TGT...GGA^®® and 
®®"'GTT...ATA^°"^"^, respectively (Figure 2F). However, the gluta- 
thione S-transferase (GST) protein showed no specific binding 
to any region of BCAR4 (Figure 2F). Deletion of the correspond- 
ing sequence of BCAR4 (A212-A311) abolished its interaction 
between SNIP1 with no effect on PNUTS binding (Figure 2G). 
Deletion of the motif sequence A968-A1087 of BCAR4 abol- 
ished its interaction with PNUTS, but not with SNIP1 (Figure 2G). 
Electrophoretic mobility shift assays (EMSAs) were further used 
to confirm the direct binding of BCAR4 with SNIP1 and PNUTS. 
Incubation of the BCAR4 RNA probe (nucleotide [nt] 235-288) 
and (nt 991-1044) with recombinant SNIP1 and PNUTS, respec- 
tively, resulted in specific gel retardation (Figure 2H). Under 
these conditions, no shift was observed when the correspond- 
ing cold probes were used (Figure 2H). We, therefore, conclude 
that BCAR4 directly bind to SNIP1 and PNUTS via two distinct 
regions. 



Given MS data showing that GLI2 is phosphorylated at Seri 49 
and associates with CIT kinase (see Figures 2A and S2B), we 
reasoned that CIT may serve as a kinase to phosphorylate 
GLI2. An in vitro kinase assay indicated that bacterially ex- 
pressed wild-type (WT) GLI2 was phosphorylated by CIT, but 
not SI 49A mutant (Figure S2F). ULK3 served as the positive con- 
trol due to its reported ability to phosphorylate GLI (Maloverjan 
et al., 2010). An in vitro RNA-protein binding assay using bio- 
tinylated BCAR4 and GLI2 proteins phosphorylated by CIT 
in vitro showed no interaction (Figure S2G). 

To investigate the role of GLI2 Seri 49 phosphorylation in vivo, 
we generated rabbit polyclonal antibodies that specifically 
recognized Ser149-phosphorylated GLI2 referred to as p-GLI2 
(Seri 49) antibody, which specifically detected bacterially purified 
GLI2 protein that phosphorylated by CIT in vitro, with minimal 
reactivity toward GLI2 phosphorylated by ULK3 (Figure 21). We 
conclude that p-GLI2 (Seri 49) antibody specifically recognizes 
CIT-mediated Seri 49 phosphorylation of GLI2. Next, we eval- 
uate the level of phospho-GLI2 in breast cancer by immunohisto- 
chemistry (IHC) analysis of clinical tumor specimens, finding 
higher p-GLI2 (Seri 49) levels in invasive breast cancer tissues 
compared with adjacent normal tissues (p = 0.0087) (Figure 2J). 
Our IHC staining further revealed increased p-GLI2 (Seri 49) level 
in multiple cancer types compared to their corresponding normal 
tissues (Figure S2H; Table S5). IHC analysis also revealed higher 
CIT expression in invasive breast cancer compared with adjacent 
normal breast tissues (p = 0.0055) (Figure S2I) and the staining of 
phosphorylated GLI2 strongly correlated with that oiBCAR4 and 
CIT staining (data not shown). Taken together, we identified and 
characterized that BCAR4 binds a protein complex containing 
SNIP1 , PNUTS, phosphorylated GLI2, and CIT via its direct inter- 
action with SNIP1 and PNUTS. 

CCL21 Induces GLI2 Seri 49 Phosphorylation and 
Nuclear Translocation of Phosphorylated GLI2 

The CIT kinase-mediated GLI2 phosphorylation prompted us 
to investigate whether this phosphorylation could be triggered 
in MDA-MB-231 cells by hedgehog signaling. Surprisingly, 
although the ligand SHH activated hedgehog signaling in Daoy 
cells evidenced by stimulated SHH gene induction as previously 
reported (Wang et al., 2012), minimal effect was observed in 
MDA-MB-231 cells (Figure S3A), and no phosphorylated GLI2 
was detected (data not shown), suggesting that a noncanonical 
hedgehog signaling pathway, involving Ser149-phosphorylated 
GLI2, may exist in breast cancer. 

We then explored whether extracellular signals that activate 
CIT kinase could also trigger GLI2 phosphorylation in breast can- 
cer cells. Given that CIT kinase can be activated by GTPase Rho 
proteins (Madaule et al., 1998), we first screened the CIT-Rho 
interaction in breast cancer cells. Although CIT kinase is consti- 
tutively associated with RhoA as previously reported (Gai et al., 
2011), the presence of Rho activator specifically triggered the 



(G) qRT-PCR detection of BCAR4 expression in a panei of ceii iines. 

(H) Nuciear locaiization of BCAR4 detected by RNA FiSH in MDA-MB-231 ceiis. 

(i) identification of signai pathways affected by BCAR4 knockdown in MDA-MB-231 ceiis. x and y axes, normaiized ratio of firefly/Reniiia iuciferase activities. 
(J) qRT-PCR detection of GLi-target genes expression. Error bars, SEM of three independent experiments (*p < 0.05, **p < 0.01 , and ***p < 0.001). 

See aiso Figure SI and Tabies SI , S2, and S3. 
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Figure 2. Identification and Biochemical Characterization of BCA/74-Associated Proteins 

(A) A list of top BCAf?4-associated proteins identified by RNA pull-down and MS analysis in MDA-MB-231 cells: R1 and R2 (biological repeat 1 and 2). 

(B and C) Immunoblot (IB) detection of proteins retrieved by in vitro-transcribed biotinylated BCAR4 from MDA-MB-231 cell lysates (B) and indicated recombinant 
proteins (C). 

(legend continued on next page) 
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interaction between RhoC and CIT kinase (Figure S3B). Then we 
hypothesized that the RhoC-activating stimulus may activate 
CIT kinase. Indeed, we screened 13 known growth factors/cyto- 
kines/chemokine involved in RhoC activation and breast cancer 
metastasis (Favoni and de Cupis, 2000; Kakinuma and Flwang, 
2006), finding that CXCL12, CCL21, IGF-I, PDGF-BB, and 
TGF-pl enhanced the interaction between RhoC and CIT (Fig- 
ure 3A). The same stimuli induced activation of CIT kinase indi- 
cated by phosphorylation of MLC, a classic CIT kinase substrate 
(Yamashiro et al., 2003), with CCL21 exhibiting the highest in- 
duction (Figure S3C). We then tested the phosphorylation of 
GLI2 in MDA-MB-231 cells treated with CXCL12, CCL21, IGF- 
1, PDGF-BB, and TGF-pl, finding that CCL21 dramatically 
induced Seri 49 phosphorylation of GLI2 (Figure 3B), which 
was significantly reduced by CIT knockdown (Figure 3C). 
Consistently with previous finding that CCL21-CCR7 autocrine 
signaling is critical for breast cancer metastasis (Muller et al., 
2001), treatment of MDA-MB-231 cells with either neutralizing 
anti-CCL21 or anti-CCR7 antibodies inhibited basal or CCL21- 
induced GLI2 phosphorylation (Figures S3D and S3E). CCL21 
treatment also dramatically induced GLI2 Seri 49 phosphoryla- 
tion in a panel of additional cancer cell lines, ruling out the pos- 
sibility of cell line-specific effect (Figure S3F). 

Next, we investigated the functional consequence of Seri 49 
phosphorylation on GLI2. In the cytoplasm, GLI is associated 
with the suppressor of fused homolog (SUFU), which regulates 
the cellular localization of GLI (Dunaeva et al., 2003). We 
performed coimmunoprecipitation experiments and observed 
that CCL21 treatment induced dissociation between GLI2 and 
SUFU (Figure S3G), whereas the exogenously expressed GLI2 
S149A mutant failed to release from SUFU in response to 
CCL21 (Figure 3D). Given that SNIP1 , which is in the same com- 
plex with GLI2 (see Figure 2A), harbors an FHA domain that rec- 
ognizes phosphoserine/threonine, we hypothesized that Seri 49 
phosphorylation of GLI2 is required for its interaction with SNIP1 
via the FHA domain. Indeed, either knockdown of CIT or intro- 
duction of S149A mutant reduced CCL21 -induced interaction 
between GLI2 and SNIP1 (Figures 3C and 3E). Consistently, 
deletion or point mutation of amino acids that are critical for 
FHA domain function (Durocher et al., 2000) also abolished 
SNIPI’s interaction with phosphorylated GLI2 (Figures 3F and 
3G). We then performed nuclear fractionation experiments, 
finding that phosphorylated GLI2 translocated to the nucleus 
upon CCL21 treatment; whereas CIT, SNIP1, and PNUTS did 
not exhibit relocation (Figure 3H). The phospho-GLI2-specific 
antibody also exhibited nuclear staining patterns in breast 
cancer tissue samples (see Figure 2J). Knockdown of CIT or 



SNIP1 abolished CCL21 -induced nuclear translocation of GLI2 
(Figure 31). In accordance with this, GLI2 S149A mutant failed 
to translocate into the nucleus upon CCL21 treatment (Fig- 
ure S3H). Our findings reveal a CCL21/CIT kinase/phospho- 
GLI2/SNIP1 signaling cascade in breast cancer cells, which 
may represent a noncanonical hedgehog pathway. 

BCAR4 Is Required for Transcription Activation of 
Phospho-GLI2-Dependent Target Genes in Breast 
Cancer Cells 

To test if CCL21/CIT/SNIP1 signaling axis-mediated phospho- 
GLI2 nuclear translocation leads to the activation of GLI target 
genes, we performed a chromatin immunoprecipitation (ChIP) 
assay using antibodies against GLI2 or phospho-GLI2, finding 
that Seri 49 phosphorylated GLI2 was present on the promoters 
of several well-established GLI target genes PTCH1 , IL-6, 
MUC5AC, and TGF-^1, but not on the promoter of a non-GLI 
target gene, RPLPO (Figures 4A and 4B). We then performed 
a chromatin isolation by RNA purification (ChIRP) assay to 
examine the genomic occupancy of BCAR4, finding that in 
response to CCL21 treatment, BCAR4 was recruited to the 
promoters of PTCH1 , IL-6, MUC5AC, and TGF-131 (Figures 4C, 
S3I, and S3J). Consistently, either knockdown of BCAR4 or 
overexpression of GLI2 S149A mutant dramatically impaired 
CCL21 -induced expression of PTCH1 , IL-6, MUC5AC, and 
TGF-(31 genes (Figure 4D; data not shown). 

One of the major biological roles of GLI is to modulate the gene 
expression related to cell migration and invasion (Feldmann 
et al., 2007). Thus, we examined the effect of GLI2, BCAR4, 
and other BCAR4 bound proteins on breast cancer cell invasion 
and migration. The treatment of MDA-MB-231 cells with vali- 
dated siRNAs against BCAR4, CIT, SNIP1 , or PNUTS or neutral- 
izing antibody against CCL21 all dramatically inhibited cell 
migration (Figures 4E-4G) and invasion (Figure 4H; data not 
shown) but did not affect cell proliferation (Figure S4A). Consis- 
tently, stable knockdown of BCAR4 by small hairpin RNAs 
(shRNAs) in MDA-MB-231 LM2 cells reduced migration and in- 
vasion properties of these cells (Figures S4B-S4D). We also 
tested if BCAR4 is critical for migration and invasion of those 
metastatic cancer cell lines that respond to CCL21 treatment 
(see Figure S3F). Our data showed that while knockdown of 
BCAR4 had no effect on proliferation of HCT116, HI 299, 
HepG2, and Hey8 cells (Figures S4E and S4F), the migration 
and invasion of these cells were significantly reduced (Figures 
S4G and S4H; data not shown). In addition, CCL21 -induced 
GLI2 target genes expression in these cell lines was inhibited 
by BCAR4 knockdown (Figures S4I and S4J; data not shown). 



(D and E) IB detection of Myc -tagged SNIP1 (D) and PNUTS (E) (WT versus domain truncation mutants) retrieved by in vitro-transcribed biotinylated BCAR4. 
Lower panels: graphic illustration of the domain structure of SNIP1 (D) or PNUTS (E). 

(F) In vitro RNA-protein binding, followed by dot blot assays. Bottom panel: schematic illustration of the BCAR4 sequence motifs that is recognized by SNIP1 and 
PNUTS, respectively. 

(G) IB detection of proteins retrieved by in vitro-transcribed biotinylated BCAR4 (WT versus A212-A31 1 and A968-A1087) from MDA-MB-231 cell lysates. 

(H) EMSA of recombinant SNIP1 and PNUTS binding to BCAR4 nt 235-288 and nt 991-1044, respectively. 

(I) In vitro kinase assay showing CIT-mediated phosphorylation of GLI2 (WT versus S149A). *, unspecific band. 

(J) IHC staining of phospho-GLI2 (SI 49) in human breast cancer and adjacent normal tissues. Left: representative image. Right: statistics analysis based on 10 
normal tissues versus 222 cancer tissues is shown. 

See also Figure S2 and Tables S2, S4, and S5. 
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Figure 3. Identification of a Noncanonical Hedgehog Signaling Pathway Mediated by CCL21/CIT/Phospho-GLI2 Signaling Axis 

(A and B) Immunoprecipitation (IP) and IB detection of CIT-RhoC interactions (A) and GLi2 phosphoryiations (B) in ceiis treated with indicated growth factors, 
cytokines, or chemokines. 

(C) iP and iB of GLi2 phosphoryiations in ceiis transfected with indicated siRNAs foiiowed by CCL21 treatment. 

(D and E) IP and IB detection of GLI2-SUFU (D) or GLI2-SNIP1 (E) interactions in MDA-MB-231 cells transfected with GLI2 (WT versus S149A) followed by CCL21 
treatment. 

(legend continued on next page) 
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Given that BCAR4 is critical for metastasis potential of cancer 
cells and our observation of lower BCAR4 expression level in 
nonmetastatic breast cancer cell lines compared to metastatic 
breast cell lines (see Figure 1G), we reasoned that overexpres- 
sion of BCAR4 in a nonmetastatic cell line may increase its 
metastasis potential. MCF-7 is a nonmetastatic breast cancer 
cell line but expresses the CCR7, the receptor for CCL21 (Muller 
et al., 2001). Indeed, stimulation of MCF-7 cells with CCL21 
modestly enhanced their invasion (Figure 41). However, overex- 
pression of full-length BCAR4, but not the deletion mutants 
abolishing SNIP1 or PNUTS binding in MCF-7 cells (Figure S4K), 
increased the invasion and GLI2 target genes expression 
even under the basal condition (Figures 41, 4J, and S4L), which 
was not due to cell proliferation effect (Figure S4M). These 
data strongly argue the important role of BCAR4 in the phos- 
pho-GLI2-mediated transcription activation of a subset of genes, 
which may contribute to breast cancer cell migration and 
invasion. 

BCAR4 Binds SNIP1 and Releases the Inhibitory Effect 
of SNIP1 on p300 HAT Activity 

Next, we investigated the molecular mechanism by which 
BCAR4 regulates GLI2 target genes expression. Considering 
that BCAR4 directly interacts with SNIP1 in vitro, we explored 
whether this interaction is functionally important in vivo by exam- 
ining the SNIP1 -BCAR4 interaction by an RNA immunoprecipita- 
tion (RIP) assay, finding that in response to CCL21 treatment, 
SNIP1 bound to BCAR4 in multiple cancer cell lines (Figures 
S5A-S5C). As a control, no interaction between SNIP1 and 
NEAT2, an abundant nuclear IncRNA, was observed (Figures 
S5A-S5C). As expected, deletion of the 97-274 aa region abol- 
ished Sh\\P^-BCAR4 interaction (Figure 5A), which is consistent 
with our previous observation that the DUF domain of SNIP1 is 
required for SU\P^ -BCAR4 interaction (see Figure 2D). Surpris- 
ingly, deletion of the FHA domain (region 274-349 aa) of SNIP1 
led to constitutive SNIP1-RC/AR4 interaction (Figures 5A and 
S5D), suggesting that binding to phosphoserine/threonine via 
its FHA domain, is required for SNIPI’s subsequent interaction 
with BCAR4, possibly through a mechanism involving the 
conformational change of SNIP1 upon phospho-GLI2 binding. 
Indeed, FHA domain mutants of SNIP1 all failed to interact with 
BCAR4, whereas wild-type SNIP1 along with the D356N mutant, 
which exhibits no effect on phospho-GLI2 binding, was able to 
bind BCAR4 (Figure 5B). These data suggest that SNIP1 ’s FHA 
domain may block the DUF domain, preventing SH\P^-BCAR4 
interaction. Upon stimulation, the FHA domain recognizes phos- 
pho-Ser149 of GLI2, which causes conformational changes that 
may expose the DUF domain for BCAR4 binding. 

SNIP1 has been reported to interact with p300 and potentially 
regulates p300-dependent gene transcription (Kim et al., 2000). 
Although immunoprecipitation of SNIP1 confirmed its interaction 
with p300, the interaction was not affected by deprivation of 



BCAR4 (Figure S5E). Deletion of either DUF domain of SNIP1 (re- 
gion 97-274 aa) ortheeC/AF?4 SNIP1 binding motif (nt 212-311) 
exhibited minimal effect on SNIP1-p300 interaction (Figures S5F 
and S5G). We then examined the HAT activity of p300 in the 
presence of SNIP1 and/or RC/AR4. Surprisingly, the HAT activity 
of p300, was strongly inhibited by recombinant SNIP1 but could 
be rescued by in vitro-transcribed BCAR4 RNA (Figure 5C). This 
rescue was dependent on the interaction between BCAR4 and 
SNIPI’s DUF domain because the presence of BCAR4 alone 
had no effect on the HAT activity of p300. Moreover, deletion 
of BCAR4’s SNIP1 binding motif (nt 212-311) abolished the 
rescue of p300’s HAT activity (Figure 5C). Therefore, our data 
indicated that the interaction between SNIP1 and BCAR4 
released the inhibitory role of SNIP1 on the HAT activity of p300. 

Although it has been suggested that SNIP1 regulates the 
p300-dependent transcription of multiple signaling pathways 
(Fujii et al., 2006; Kim et al., 2000, 2001), the mechanism is 
not clear. We mapped the domains of SNIP1 that may interact 
with p300 and found that while both the N-terminal (2-80 aa) 
and DUF domain (97-274 aa) of SNIP1 were required for 
p300 binding (Figure S5H), the DUF domain of SNIP1 is the 
minimum region required to inhibit the enzymatic activity 
of p300 (Figure 5D). By incubating SNIP1 with p300 catalytic 
unit (aa 1198-1806) and derivative truncation mutants, we 
found that the DUF domain of SNIP1 interact with PHD (aa 
1198-1278) and CH3 domains (aa 1664-1806) of p300 catalytic 
unit, which may interfere with p300’s HAT activity (Figure 5E). 
According to our in vitro observations, the DUF domain also 
binds BCAR4, raising a possible role of BCAR4 in regulating 
p300’s HAT activity. Indeed, in the presence of BSA and 
tRNA, p300 exhibited dose-dependent HAT activity, which 
was abolished in the presence of SNIP1 DUF domain alone 
(Figure 5F). In contrast, in the presence of sense but not anti- 
sense BCAR4, p300 HAT activity was largely rescued (Fig- 
ure 5F). These data suggest that the DUF domain of SNIP1 
binds PHD and CH3 domains of p300 to inhibit the HAT 
activity, whereas signal-induced binding of BCAR4 to SNIP1 
DUF domain releases its interaction with the catalytic domain 
of p300, leading to the activation of p300. 

p300-mediated histone acetylation is critical for transcription 
activation (Wang et al., 2008). We then screened histone acety- 
lation on GLI2 target gene promoters, finding that H3K18ac, 
H3K27ac, H3K56ac, H4K8ac, H4K12ac, and H4K16ac 
were induced by CCL21 treatment in breast cancer cells, with 
H3K18ac showing the highest level (Figure 5G). Knockdown of 
BCAR4 abolished CCL21 -induced H3K18 acetylation on GLI2 
target gene promoters; however, this was not due to reduced 
recruitment of phosphorylated-GLI2 or p300 to GLI2 (Figure 5H). 
These findings suggest that BCAR4 activates p300 by binding 
SNIPI’s DUF domain to release the inhibitory effect of SNIP1 
on p300, which results in the acetylation of histone marks 
required for gene activation. 



(F and G) IP and IB detection of GLI2-SNIP1 interactions in cells transfected with SNIP1 (WT versus A274-A349) (F) or (WT versus FHA domain point mutants) (G) 
followed by CCL21 treatment. 

(H and I) IP and IB (H) and Immunofluorescence (I) detection of phospho-GLI2 nuclear translocation in cells treated with CCL21 treatment at different time points 
(H) or transfected with indicated siRNAs followed by CCL21 treatment (I). 

See also Figure S3. 
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Recognition of SC>^l74-Dependent Histone Acetylation 
by PNUTS Attenuates Its Inhibitory Effect on PP1 
Activity 

Based on our data that the 3' of BCAR4 interacts with PNUTS 
in vitro, we next examined this interaction in vivo by RIP experi- 
ments. We found that PNUTS constitutively interacts with 
BCAR4 via its RGG domain (Figures 6A, S5A-S5C, and S6A), 
an observation that is consistent with our in vitro data (see Fig- 
ure 2E). PNUTS functions as a regulatory subunit for PP1 , inhib- 
iting the phosphatase activity of PP1 (Kim et al., 2003). As such, 
we wondered whether BCAR4 could regulate PP1 ’s phospha- 
tase activity via binding PNUTS. The immunoprecipitation assay 
indicated that knockdown of BCAR4 has minimal effect on 
PNUTS-PP1A interaction (Figures S1I and SOB). As previously 
reported (Kim et al., 2003), the phosphatase activity of PP1 
was inhibited by PNUTS (Figure S6C). However, neither sense 
nor antisense BCAR4 could rescue PP1 ’s activity (Figure SOD), 
leading us to explore whether any histone modifications could 
rescue PP1 activity given that recruitment of the PNUTS/PP1 
complex by BCAR4 could possibly activate the transcription of 
GLI2 target genes. 

Surprisingly, the inhibition of PP1 ’s phosphatase activity by 
PNUTS was largely rescued by purified nucleosome from HeLa 
cells but not by recombinant nucleosome, whereas neither 
nucleosome alone affected PP1 activity (Figure OB), suggesting 
that modified histones binding is critical to release PNUTS’s 
inhibitory effect on PP1 activity. We then utilized a modified 
histone peptide array to test this possibility, finding that PNUTS, 
but not SNIP1 , directly recognized acetylated histones, including 
H4K20ac, H3K18ac, H3K9ac, H3K27ac, and H4K10ac (Fig- 
ure OC), which was confirmed by histone peptide pull-down 
experiments (Figure OD). A previous study indicated that a mini- 
mum region from 445-450 aa of PNUTS is required to inhibit the 
phosphatase activity of PP1 (Kim et al., 2003). We then examined 
if acetylated histone could also recognize this region, finding that 
deletion of aa 443-455 of PNUTS abolished its interaction with 
acetylated histone H3 (Figure 6E), suggesting that the inhibitory 
role of PNUTS, mediated by motif aa 443-455, is attenuated in 
the presence of acetylated histone, leading to activation of 
PP1 enzymatic activity. Consistently, acetylated, but not methyl- 
ated, histone peptides specifically rescued PP1 activity from 
PNUTS inhibition (Figure 6F). 

PP1 has been reported to dephosphorylate the carboxyl termi- 
nal domain (CTD domain) of RNA polymerase II at Ser5, which is 
accumulated at promoter regions of target genes (Komarnitsky 
et al., 2000; Washington et al., 2002). A recent study showed 
that depletion of PNUTS in Drosophila results in global 
hyperphosphorylation of RNA Pol II Ser5, leading to global tran- 



scription pause and development defect (Ciurciu et al., 2013). 
Therefore, we tested if PNUTS/PP1 regulates phosphorylation 
of RNA Pol II Ser5, finding that knockdown of PNUTS led to 
the hyperphosphorylation of RNA Pol II Ser5 (Figures S6E and 
S6F). We then investigated the functional roles of PNUTS-acet- 
ylated histone interaction in regulating the status of RNA Pol II 
Ser5 phosphorylation in the presence of a p300 inhibitor, 
C646, which eliminated the histone acetylation as represented 
by H3K18ac (Figures 6G, S6G, and S6H). Our data indicate 
that CCL21 -triggered recruitment of PNUTS and PP1 to the pro- 
moters of GLI2 target genes was not affected by p300 inhibitor 
(Figures 6G, S6G, and S6H) and that the levels of Pol II Ser5 
phosphorylation on these promoters were decreased upon 
CCL21 treatment (Figures 6G, S6G, and S6H). However, the 
CCL21 -induced hypophosphorylation of RNA Pol II Ser5 was 
abolished in the presence of the p300 inhibitor (Figures 6G, 
S6G, and S6H), suggesting that histone acetylation-dependent 
PP1 activity modulates RNA Pol II Ser5 phosphorylation levels 
at gene promoter regions. Taken together, the data demonstrate 
the important roles oi BCAR4, through its interaction with SNIP1 
and PNUTS, in linking signal-induced acetylation of histone 
to general transcription machinery during the activation of the 
GLI2 target genes in breast cancer cells. 

BCAR4 as a Potential Therapeutic Target for Breast 
Cancer Metastasis 

To further confirm the functional connection between BCAR4 
and breast cancer metastasis, we performed functional rescue 
experiments in which we depleted BCAR4 by LNA, followed by 
overexpression in MDA-MB-231 cells of either LNA-resistant 
full-length BCAR4 or truncated mutants defective for SNIP1 or 
PNUTS binding (see Figures 2F-2H and S7A). In cell motility as- 
says, knockdown of BCAR4 reduced migration and invasion of 
MDA-MB-231 cells, which could be rescued by reintroduction 
of full-length, but neither A21 2-A31 1 nor A968-A1 087 truncated 
form oi BCAR4 (Figures S7B and S7C), even though the expres- 
sion of full-length BCAR4 and the truncated forms were equal 
(Figure S7A), and cell proliferation was not altered (data not 
shown). Knockdown of BCAR4 also curtailed the expression 
of GLI2 target genes and reintroduction of full-length BCAR4, 
but neither A212-A311 nor A968-A1087 truncated forms of 
BCAR4 were able to robustly rescue the induction of these genes 
(Figures S7D and S7E). Consistently, knockdown of BCAR4 
abolished CCL21 -induced SNIP1 and PNUTS interaction, while 
reintroduction of full-length BCAR4, but neither A212-A31 1 nor 
A968-A1087 truncated forms of BCAR4, was able to robustly 
rescue the interaction (Figure S7F). These data suggest that 
BCAR4 exerts a quantitatively important role in GLI2-dependent 



Figure 4. BCAR4 Is Required for CCL21 -Triggered, Phospho-GLI2-Mediated Gene Activation and Cell Migration 

(A-C) Chip quantitative real-time PGR detection of GLI2 (A), phospho-GLI2 (B), or ChIRP quantitative real-time PGR detection oi BCAR4 (C) occupancy on the 
promoters of selected GLI2 target genes in MDA-MB-231 cells treated with CCL21 . RPLPO served as a non-GLI2 target gene control (A and B). 

(D) qRT-PCR detection of GLI2 target genes expression in MDA-MB-231 cells transfected with control or BCAR4 siRNA followed by CCL21 treatment. 

(E-G) Cell migration assays in MDA-MB-231 cells transfected with indicated siRNA (E and F) or treated with CCL21 neutralization antibody (G). 

(H) Matrigel cell invasion assay in cells transfected with indicated siRNAs. Left: representative images; right: statistical analysis. 

(I and J) Matrigel cell invasion assay (I) or qRT-PCR detection of GLI2 target genes (J) in MCF7 cells electroporated with indicated BCAR4 expression vectors 
followed by CCL21 treatment. 

Error bars, SEM of three independent experiments (*p < 0.05, **p < 0.01 , and ***p < 0.001). See also Figure S4. 
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Figure 5. Signal-Induced BC4/74-SNIP1 Interaction Attenuates the Inhibitory Effect of SNIP1 on p300 HAT Activity 

(A and B) RIP quantitative reai-time PCR detection of the indicated RNAs retrieved by Myc-specific antibody in MDA-MB-231 ceiis eiectroporated with indicated 
vectors foiiowed by CCL21 treatment. 

(C) in vitro HAT activity assays of p300 in the presence of WT SNiPI , fuii-iength (FL) BCAR4, and their corresponding mutants as indicated. 

(D) In vitro HAT activity assays of p300 in the presence of WT SNiPI and its corresponding mutants as indicated. 

(E) iB detection of the interaction between SNiPI (aa 97-274) and p300 (aa 1198-1806) WT or truncations. 

(F) in vitro HAT activity assays of p300 in the presence of SNiPI aa 97-274 and BCAR4 sense or antisense RNAs. 

(G) Chip quantitative reai-time PCR detection of acetyiated histone marks occupancy on the promoters of seiected GLi2 target genes in M DA- MB -231 ceiis 
treated with CCL21 . 

(H) Chip quantitative reai-time PCR detection of phospho-GLi2, p300, and H3K18Ac occupancy on PTCH1 promoter in MDA-MB-231 ceiis transfected with 
indicated siRNAs foiiowed by CCL21 treatment. 

Error bars, SEM of three independent experiments (*p < 0.05 and **p < 0.01). See aiso Figure S5. 
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Figure 6. Recognition of BC4/74- Dependent Histone Acetylation by PNUTS Attenuates Its Inhibitory Effect on PP1 Activity 

(A) RIP quantitative real-time PGR detection of the indicated RNAs retrieved by Myc-specific antibody in MDA-MB-231 cells transfected with indicated vectors 
followed by CCL21 treatment. 

(B) In vitro phosphatase activity assays of PP1A in the presence of indicated proteins or nucleosome. 

(C) MODified Histone Peptide Array detection of histone marks recognition by SNIP1 or PNUTS. Top: representative images; bottom: binding specificity. 

(D and E) IB detection of PNUTS retrieved by biotinylated histone peptides as indicated from lysate of MDA-MB-231 cells (D) or MDA-MB-231 cells electroporated 
with indicated vectors (E). 

(legend continued on next page) 
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target gene activation and cell migration/invasion via its direct in- 
teractions with SNIP1 and PNUTS. 

Next, we recapitulated the contribution of BCAR4 to breast 
cancer metastasis in vivo using highly metastatic MDA-MB- 
231 LM2 cells harboring shRNA-targeting BCAR4, which 
showed reduced migration and invasion (see Figures S4B- 
S4D). Bioluminescent imaging (BLI) measurements revealed 
that mammary gland fat-pad injection of MDA-MB-231 LM2 cells 
harboring control shRNA resulted in lung metastases in NOD/ 
SCID mice, whereas lung metastasis was significantly reduced 
in two individual groups of mice injected with cells harboring 
BCAR4 shRNA (Figure 7A), which was confirmed by quantifica- 
tion of lung metastasis nodules (with an average of 11.2 per 
mouse in control group and an average of two visible metastases 
per mouse in BCAR4 knockdown groups) and histological exam- 
ination (Figures 7B and 7C). BCAR4 knockdown had no effect on 
primary tumor size, tumor cell proliferation, or apoptosis (Figures 
S7G and S7FI), indicating that the metastasis suppression 
phenotype is not secondary to impaired proliferation or 
apoptosis. Flowever, CD31, a marker for angiogenesis, was 
significantly downregulated by BCAR4 knockdown (Figure S7FI), 
suggesting that reduced lung metastasis burden is due to 
defective angiogenesis. Independently, the mice with tail-vein 
injection of BCAR4 knockdown cells rarely developed lung 
metastases (Figures 7D-7F). Immunohistochemical analyses 
confirmed efficient inhibition of metastasis (Figure S7I). These 
data suggest that BCAR4 contribute to breast cancer metastasis 
and silencing of BCAR4 inhibits lung metastasis in transplant- 
able mouse models. 

To evaluate the potential therapeutic potential of BCAR4, 
we synthesized LNAs targeting BCAR4. Transfection of LNAs 
against BCAR4 into MDA-MB-231 cells exhibited strong knock- 
down efficiency (see Figure SI I) and dramatically affected cell 
migration and invasion (data not shown). Next, we examined 
the therapeutic efficacy of systemically administered in vivo-opti- 
mized LNAs in breast cancer metastasis prevention. Of note, two 
individual LNA treatments significantly reduced lung metastases 
(Figures 7G and 7H) without notable weight loss (Figure S7J). 
Importantly, therapeutic LNA-mediated BCAR4 targeting was 
confirmed by qRT-PCR analysis of lung metastatic nodules (Fig- 
ure 71). Taken together, our findings reveal a BCAR4-6epen6enX 
regulatory network that converges onto a noncanonical hedge- 
hog signaling pathway mediated by phospho-GLI2 to control 
metastatic initiation and progression in breast cancer. 

DISCUSSION 

Effective treatment options for breast cancer metastasis, espe- 
cially for TNBC is not well established. IncRNA-based mecha- 
nisms in breast cancer may represent the crucial nodal points 
for therapeutic intervention. Our studies have revealed that the 
IncRNA BCAR4 is highly upregulated in advanced breast cancer 
patients and contributes to breast cancer metastasis mediated 



by chemokine-induced binding of BCAR4 to two transcription 
factors with extended regulatory consequences, licensing the 
activation of a noncanonical hedgehog/GLI2 transcriptional pro- 
gram that promotes cell migration (Figure 7J). In a variety of can- 
cer types, including prostate, breast, ovarian, and pancreatic 
cancers, hedgehog signaling pathways, which are critical for tu- 
mor progression and invasion, are aberrantly activated. We are 
tempted to speculate that other IncRNAs in these cancer types 
recognize covalent modifications of GLI2 or other proteins and 
exert an analogous function to promote the aberrant cancer 
signaling pathways, which confers cancer cells the invasiveness 
and metastatic propensity. 

Although our data reveal that BCAR4 exerts a quantitatively 
important role in chemokine-dependent Hedgehog target gene 
activation in breast cancer cells, the full mechanisms by which 
it functions in development remain incompletely defined. 
BCAR4 is also highly expressed in human oocyte and placenta 
(Godinho et al., 2011), suggesting its potential roles in develop- 
ment. Interestingly, hedgehog ligands are expressed in a tis- 
sue-specific manner, e.g., desert hedgehog (Dhh) expression 
is specific to sertoli cells of the testes and granulosa cells of 
ovaries (Varjosalo and Taipale, 2008). These observations indi- 
cate that BCAR4 is also critical for GLI-mediated gene expres- 
sion during development. 

The BCAR4 upregulation in breast cancer could be the result 
of the dysregulation of estrogen receptor (ER). Previous studies 
have shown that BCAR4 is upregulated in response to tamoxifen 
treatment of breast cancer cells (Godinho et al., 2011); thus, up- 
regulation of BCAR4 could be the result of ER downregulation, 
as seen in TNBC. It is also possible that BCAR4 expression is 
regulated at the transcriptional level by certain aberrant onco- 
genic signaling pathways in breast cancer cells or by gene ampli- 
fication at the genomic level. Thus, BCAR4 expression may 
require further investigation. 

The targeting of IncRNAs with LNAs in breast cancer has not 
gained much momentum due to the lack of identification of crit- 
ical breast cancer-relevant IncRNAs and rigorous investigation 
of the potential anticancer effects of the modulation of IncRNAs 
in vivo. The important prognostic capacity of BCAR4 and the 
robust metastasis suppression by therapeutically delivered 
LNA targeting BCAR4 documented in our study encourage 
future development of IncRNA-based cancer therapies for pa- 
tients at high risk for metastasis, an outcome currently lacking 
effective chemotherapeutic options. 

EXPERIMENTAL PROCEDURES 
IncRNA Array Version 3.0 

Total RNA was extracted from two pairs of fresh frozen infiltrating ductal car- 
cinomas of the breast and their adjacent normal breast tissues. RNA samples 
were subjected to human genome-wide IncRNA microarray (v. 3.0) analyses at 
ArrayStar. IncRNA array data have been deposited in the Gene Expression 
Omnibus database under accession GSE60689. Details are included in the 
Extended Experimental Procedures. 



(F) In vitro phosphatase activity assays of PP1A in the presence of PNUTS and modified histones H3 as indicated. 

(G) Chip quantitative real-time PGR detection of H3K18Ac, PNUTS, Pol II Ser5 (normalized by Pol II occupancy), and PP1A occupancy on PTCH1 promoter in 
MDA-MB-231 cells pretreated with C646 followed by CCL21 treatment. 

Error bars, SEM of three independent experiments (*p < 0.05 and **p < 0.01). See also Figure S6. 
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Figure 7. The Potential Therapeutic Role of BCAR4 in Breast Cancer Metastasis 

(A-C) Representative bioluminescent (BLI) images (A), metastatic noduies numbers in the iungs (B), or isoiated iung bright-fieid imaging (top) and hematoxyiin and 
eosin staining (middie and bottom) (C) of mice with fat-pad injection of MDA-MB-231 LM2 ceiis harboring indicated shRNA. Data are means ± SEM (n = 5). 
(D-F) Representative BLi images (D), iung coionization (E), and metastatic noduies numbers in the iungs (F) of mice with taii-vein injection of MDA-MB-231 LM2 
ceiis harboring indicated shRNA. Data are means ± SEM (n = 5). 

(legend continued on next page) 
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Tissue Specimens 

Fresh frozen breast carcinomas and their adjacent normai tissues were pur- 
chased from Asterand. Breast cancer tissue microarrays were purchased 
from Biomax and U.S. BioLab and were grouped into two sets: training 
set (BC081120, BR1505a, and BR487 from Biomax) and vaiidation set 
(Bre170Sur-01 from U.S. Bioiab). Aii ciinicopathoiogicai features of tissue 
specimens are iisted in Tabie S2. The protocoi was approved by the institu- 
tionai Review Board at the MD Anderson Cancer Center. 

RNAScope Assay 

The RNAScope probe targeting BCAR4 was designed and synthesized by 
Advanced Ceii Diagnostics and detection of BCAR4 expression was per- 
formed using the RNAscope 2.0 High Definition (HD)— BROWN Assay in 
accordance with the manufacturer’s instructions (Advanced Ceii Diagnostics). 
The images were acquired with Zeiss Axioskop2 Pius Microscope. 

RNA Pull-Down and Mass Spectrometry Analysis 

Biotin-iabeied BCAR4 RNAs were in vitro transcribed with the Biotin RNA La- 
beiing Mix (Roche) and T7 or SP6 RNA poiymerase (Ambion) and purified by 
RNA Ciean & Concentrator-5 (Zymo Research). The ceii iysates were freshiy 
prepared using ProteaPrep Zwitterionic Ceii Lysis Kit, Mass Spec Grade 
(Protea) with Anti-RNase, Protease/Phosphatase inhibitor Cocktaii, Panobi- 
nostat, and Methyistat, suppiemented in the iysis buffer. The BcMag Mono- 
mer Avidin Magnetic Beads (Biocione) were first prepared in accordance 
with manufacturer’s instructions and then immediateiy subjected to RNA 
(20 lag) capture in RNA capture buffer (20 mM Tris-HCi [pH 7.5], 1 M NaCi, 
and 1 mM EDTA) for 30 min at room temperature with agitation. The RNA- 
captured beads were washed once with NT2 buffer (50 mM Tris-HCi [pH 
7.4], 150 mM NaCi, 1 mM MgCi 2 , and 0.05% NP-40) and incubated with 
30 mg ceii iysates diiuted in NT2 buffer suppiemented with 50 U/mi RNase 
OUT, 50 U/mi Superase«IN, 2 mM dithiothreitoi, 30 mM EDTA, and Heparin 
0.02 mg/mi for 2 hr at 4°C with rotation. The RNA-binding protein compiexes 
were washed sequentiaiiy with NT2 buffer (twice), NT2 high-sait buffer 
containing 500 mM NaCi (twice), NT2 high-sait buffer containing 1 M NaCi 
(once), NT2-KSCN buffer containing 750 mM KSCN (twice), and PBS 
(once) for 5 min at 4°C and eiuted by 2 mM D-biotin in PBS. The eiuted pro- 
tein compiexes were denatured, reduced, aikyiated, and digested with im- 
mobiiized trypsin (Promega) for MS anaiysis at MD Anderson Cancer Center 
Proteomics Faciiity. 

In Vivo Breast Cancer Metastasis Assays 

Ail animal studies were performed with MD Anderson Cancer Center’s Institu- 
tional Animal Care and Use Committee (lACUC) approval. In vivo spontaneous 
and experimental breast cancer metastasis assays were performed as 
described (Chen et al., 2012; Minn et al., 2005). For animal study with LNA in- 
jection, mice were intravenously injected with in vivo -grade LNAs (Exiqon) in 
PBS (1 5 mg/kg) twice a week for three weeks after MDA-MB-231 LM2 cells in- 
jection. The tumor growth and lung metastasis were monitored by Xenogen 
IVIS 100 Imaging System. 

Data Analysis and Statistics 

Relative quantities of gene expression level were normalized to B2M. The rela- 
tive quantities of ChIP and ChIRP samples were normalized by individual in- 
puts, respectively. Results are reported as mean ± SEM of three independent 
experiments. Comparisons were performed using two-tailed paired Student’s 
t test. *p < 0.05, **p < 0.01 , and ***p < 0.001 . Fisher’s exact test was used for 
statistical analyses of the correlation between each marker and clinical param- 
eters. For survival analysis, the expression of BCAR4 was treated as a binary 
variable, divided into “high” and “low” BCAR4 expression. Kaplan-Meier sur- 



vival curves were compared by the Gehan-Breslow Test in Graphpad Prism 
(GraphPad Software). 

ACCESSION NUMBERS 

The Gene Expression Omnibus database accession number for the triple- 
negative breast cancer LncRNA Array (v. 3.0) reported in this paper is 
GSE60689. 
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SUMMARY 

The MYC oncoproteins are thought to stimulate tumor 
cell growth and proliferation through amplification of 
gene transcription, a mechanism that has thwarted 
most efforts to inhibit MYC function as potential 
cancer therapy. Using a covalent inhibitor of cyclin- 
dependent kinase 7 (CDK7) to disrupt the transcrip- 
tion of amplified MYCN in neuroblastoma cells, 
we demonstrate downregulation of the oncoprotein 
with consequent massive suppression of MYCN- 
driven global transcriptional amplification. This res- 
ponse translated to significant tumor regression in a 
mouse model of high-risk neuroblastoma, without 
the introduction of systemic toxicity. The striking 
treatment selectivity of MYCAZ-overexpressing cells 
correlated with preferential downregulation of su- 
per-enhancer-associated genes, including MYCN 
and other known oncogenic drivers in neuroblas- 
toma. These results indicate that CDK7 inhibition, by 
selectively targeting the mechanisms that promote 
global transcriptional amplification in tumor cells, 
may be useful therapy for cancers that are driven by 
MYC family oncoproteins. 

INTRODUCTION 

Many human cancers depend on the deregulated expression of 
MYC family members for their aberrant growth and proliferation, 
with elevated expression of these oncogenes predicting aggres- 
sive disease and a poor clinical outcome (Eilers and Eisenman, 
2008; Wasylishen and Penn, 2010). Deactivation of MYC in cell 
lines and MYC-induced transgenic tumors causes proliferative 
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arrest and tumor regression (Arvanitis and Felsher, 2006; Soucek 
et al., 2008), suggesting that effective targeting of MYC proteins 
would have broad therapeutic benefit. Recently, several groups 
reported that oncogenic MYC elicits its plethora of downstream 
effects in tumor cells through global transcriptional amplification, 
leading to massively upregulated expression of genes involved in 
multiple processes (Lin et al., 2012; Loven et al., 2012; Nie et al., 
2012; Schuhmacher and Eick, 2013). When present at physio- 
logical levels, MYC binds to the core promoters of actively 
transcribed genes; however, in tumor cells with MYC overex- 
pression, increased MYC levels are observed at both the core 
promoters and enhancers of the same set of genes, resulting 
in increased levels of transcripts per cell. This mechanism pro- 
vides an explanation for the lack of a common MYC transcrip- 
tional signature and for the diverse effects of deregulated MYC 
in cancer cells. Another general feature of deregulated MYC is 
its transcriptional regulation by super-enhancers (SEs), clusters 
of enhancers that are densely occupied by transcription factors, 
cofactors, and chromatin regulators (Hnisz et al., 2013). SEs 
are acquired by cancer cells through gene amplification, trans- 
location or transcription factor overexpression. They facilitate 
high-level expression of genes, including MYC, whose protein 
products are critical for the control of cell identity, growth, and 
proliferation, and which are especially sensitive to perturbation 
(Chapuy et al., 201 3; Hnisz et al., 201 3; Loven et al., 201 3). These 
emerging insights into the role of oncogenic MYC as an SE-asso- 
ciated transcriptional amplifier suggest that strategies aimed 
at disrupting the molecular mechanisms that drive this function 
might provide useful therapy for different MYC-dependent 
tumors. 

The transcription cycle of RNA polymerase II (Pol II) is regu- 
lated by a set of cyclin-dependent kinases (CDKs) that have crit- 
ical roles in transcription initiation and elongation (Larochelle 
et al., 2012). In contrast to the cell-cycle CDKs which are largely 
responsible for cell-cycle transition, these transcriptional CDKs 

CrossMark 





Cell 



(especially CDK7, a subunit of TFIIH, and CDK9, a subunit of 
pTEFb) phosphorylate the carboxy-terminal domain (CTD) of 
Pol II, facilitating efficient transcriptional initiation, pause release 
and elongation. Moreover, most CDKs are activated through 
T-loop phosphorylation by a CDK-activating kinase (CAK), which 
in metazoans appears to be uniquely controlled by CDK7 (Fisher 
and Morgan, 1994; Glover-Cutter et al., 2009; Larochelle et al., 
2007, 2012; Rossignol et al., 1997; Serizawa et al., 1995). Inhibi- 
tion of transcriptional CDKs primarily affects the accumulation of 
transcripts with short half-lives, including antiapoptosis family 
members and cell-cycle regulators (Garriga and Graha, 2004; 
Lam et al., 2001), rendering this group of kinases ideal candi- 
dates for blocking MYC-dependent transcriptional amplification. 

Here, we investigate whether inhibition of transcriptional CDKs 
can be exploited to disrupt aberrant MYC-driven transcription, 
using the deregulated expression of MYCN as a model. The 
MYCN protein shares most of the physical properties of MYC 
(Kohl et al., 1986) and is considered functionally interchange- 
able, based on the similarity of their transcriptional programs, 
the cellular phenotypes they induce, and the ability of MYCN to 
replace MYC during murine development (Boon et al., 2001 ; Ma- 
lynn et al., 2000; Toyoshima et al., 2012). In neuroblastoma (NB), 
a pediatric solid tumor arising in the peripheral sympathetic ner- 
vous system, MYCN amplification is typically associated with a 
dismal prognosis, regardless of the treatment used (Brodeur 
et al., 1984; Seeger et al., 1985). 

We demonstrate that THZ1, a newly developed covalent 
inhibitor of CDK7 (Kwiatkowski et al., 2014), selectively targets 
/V/YCA/-amplified NB cells, leading to global repression of 
MYCAZ-dependent transcriptional amplification. This response 
induces sustained growth inhibition of tumors in a mouse model 
of NB. The remarkable sensitivity of MYCAZ-amplified cells to 
CDK7 inhibition is associated with preferentially reduced expres- 
sion of SE-associated oncogenic drivers, especially MYCN. 

RESULTS 

THZ1 Exerts Potent and Selective Effects in 
MYC-Deregulated Cells 

To identify CDKs whose depletion might lead to decreased 
MYCN expression and subsequent apoptosis, we performed a 
short hairpin (shRNA) knockdown screen of CDKs with known 
transcriptional activities (CDK7, CDK8, CDK9, CDK12, CDK13, 
and CDK19) in Kelly cells, a human NB line that expresses very 
high levels of MYCN RNA and protein due to genomically ampli- 
fied MYCN (100-120 copies per cell) (Schwab et al., 1983). Ge- 
netic depletion of CDK7, CDK8, CDK9, or CDK19 led to marked 
decreases of MYCN RNA and protein with a concomitant in- 
crease in cleaved caspase 3 (CC3) expression (Figures SI A 
and SI B available online). To reproduce these results pharmaco- 
logically, we tested a panel of 1 1 inhibitors with activity against 
transcriptional CDKs in three /V/YCA/-amplified cell lines, 
observing a range of sensitivities, with the highest potency 
(IC50, 6-9 nM) shown by a newly developed covalent phenylami- 
nopyrimidine inhibitor of CDK7, THZ1 (Kwiatkowski et al., 2014) 
(Figure SI C; Table SI ). Similar results were obtained when THZ1 
was tested against a larger panel of /V/YCA/-amplified NB cell 
lines with varying levels of MYCN expression (Figures 1A and 



SID). Importantly, NB cells without MYCN amplification were 
far less sensitive to THZ1 , with IC50 values averaging ten times 
higher than those of MYCAZ-amplified cells (Figure 1A; Table 
SI). Notably, the NBL-S cell line, which expresses high levels 
of MYCA/ without genomic amplification (Cohn et al., 1990) (Fig- 
ure SI D), was quite sensitive to THZ1 , while two nontransformed 
lines (B6-MEFS and NIH 3T3) were relatively insensitive (Fig- 
ure 1 A). MYCAZ-amplified cells also showed enhanced sensitivity 
to THZ1 R, a reversible analog of THZ1 that lacks the acrylamide 
moiety required for covalent bond formation, although it was not 
as potent as the covalent inhibitor (Figure 1 B). The strong selec- 
tivity of THZ1 for/V/YCA/-amplified cells was not restricted to NB, 
but extended to H262-BT1 1 1 , a human primitive neuroectoder- 
mal tumor cell line expressing amplified MYCN (K. Ligon, per- 
sonal communication) and Raji and Daudi lymphoma cells, 
both characterized by MYC overexpression due to chromosomal 
translocation (Nishikura et al., 1985; Veronese et al., 1995) (Fig- 
ure SI E). To probe this preferential effect further, we tested the 
extent of target engagement in NB cells, using a biotinylated de- 
rivative of THZ1 (bio-THZI) with or without THZ1 pretreatment. 
Bio-THZI consistently bound to CDK7 in both MYCAZ-amplified 
and nonamplified untreated cells (Figure S1F), but became less 
efficient after THZ1 treatment, suggesting that target recognition 
was not a major factor in the enhanced inhibitory effects of THZ1 
in /V/YCA/-amplified cells. 

Next, we studied the growth inhibitory effects of THZ1 . MYCN- 
amplified cells treated with THZ1 underwent cell-cycle arrest in 
G2/M at 24 hr, an effect that was not observed in MYCN-nonam- 
plified NB cells, even after 48 hr (Figure 1 C). Moreover, THZ1 led 
to a profound induction of apoptosis in high /V/YCA/-expressing 
cells, but not in cells expressing nonamplified MYCN (Figures 
ID and S1G). Together, these data indicate that THZ1 induces 
selective cytotoxicity not only in NB cells with MYCN amplifica- 
tion, but also in other cancers overexpressing either the MYCN 
or MYC oncogene. 

THZ1 Inhibits Tumor Growth in a Mouse Model of Human 
MYCyV-Amplified NB 

Given the relative lack of target specificity of past CDK inhibitors, 
leading to adverse effects in normal cells (Lapenna and Gior- 
dano, 2009), we assessed the tolerability of THZ1 in non-tu- 
mor-bearing mice (n = 6) treated with 10 mg/kg intravenously 
twice daily. No systemic toxicity was observed even after 
4 weeks of continuous administration (data not shown). We 
next tested the therapeutic effects of THZ1 in xenograft models 
of /V/YCA/-amplified human NB derived from subcutaneous flank 
injection of Kelly cells. When tumors reached an optimal size 
(mean volume, ~150 mm^; range, 75-235 mm^), the animals 
were divided into two groups and treated with vehicle (n = 9) or 
THZ1 as above (n = 14). Treatment was continued for a mean 
of 20 days (range, 15-24 days) in the control group and 
24 days (range, 20-28 days) in the THZ1 group. Mice receiving 
THZ1 had a statistically significant reduction in tumor growth, 
again without toxicity (Figure IE). Two animals remained free 
of tumor recurrence at 35 and 1 28 days posttreatment. 

Tumors from vehicle-treated mice displayed histological fea- 
tures of human NB with poorly differentiated, small round blue 
cells displaying high mitotic activity (Figure IF). By contrast. 
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Figure 1. THZ1 Exhibits High Potency and Selectivity against MYCA^-Amplified Tumor Cells 

(A) Dose-response curves of MYCAZ-amplified and nonamplified human NB and murine fibroblast cells after treatment with increasing concentrations of THZ1 for 
72 hr. Percent ceii viabiiity reiative to that of DMSO-treated ceiis is shown here and in (B). Data represent mean ± SD of three repiicates here and in (B). 

(B) Dose-response curves of NB ceiis treated as in (A) with the reversibie CDK7 inhibitor THZ1R. 

(C) Ceii-cycie anaiysis of MYCAZ-ampiified versus nonampiified NB ceiis exposed to THZ1 (100 nM x 24 and 48 hr) by flow cytometry with propidium iodide (Pi) 
staining. Resuits are representative of three repiicates. The scaie and axes are indicated in the iower ieft corner. 

(D) Apoptosis anaiysis in MYCAZ-ampiified and nonampiified NB ceiis treated with THZ1 as in (C) by flow cytometry with Annexin V staining. Data represent mean ± 
SD of three repiicates. ***p < 0.0001 , **p < 0.001 (Student’s t test). 

(E) Tumor voiumes of MYCAZ-ampiified human NB xenografts in NU/NU {Crl:NU-Foxn1^^) mice treated with THZ1 (10 mg/kg intravenousiyp.v.] twice daiiy) (n = 14) 
or vehicie (n = 9) for 28 days. Mean ± SD vaiues are presented. ***p < 0.001 ; **p < 0.01 ; *p < 0.05 (muitipie t test, Hoim-Sidak method). 

(F) immunohistochemicai (iHC) anaiysis of morphoiogy (hematoxyiin & eosin [H&E]), proiiferation (Ki67) and apoptosis (cieaved caspase 3 [CC3]) in tumors 
harvested from animais treated with vehicie or THZ1 as in (E) for 12 days. Scaie bar represents 25 ^iM. 

See aiso Figure S1 and Tabie S1 . 
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Figure 2. THZ1 Inhibits General Transcription and Cell-Cycle Regulation in MYC/V-Amplified Tumor Models 

(A) Immunoblot analysis of RNA Pol II CTD phosphorylation in MyCAZ-amplified and nonamplified NB cells treated with DMSO or THZ1 at the indicated con- 
centrations for the indicated times. 

(B) Immunoblot analysis of RNA Pol II CTD phosphorylation in human NB xenograft tumor cells obtained from mice treated with vehicle or THZ1 (10 mg/kg i.v. 
twice daily) for 12 days. 

(C) Immunoblot analysis of MCL1 in MyCAZ-amplified (Kelly, NGP) versus nonamplified (SK-N-FI, SH-SY5Y) NB cells following treatment with THZ1 at the 
indicated concentrations and durations. 

(D) Immunoblot analysis of proteins involved in cell-cycle progression in MyCAZ-amplified (Kelly, IMR-32) and nonamplified (SH-SY5Y) NB cells following 
treatment with THZ1 100 nM for 3 and 6 hr. 

See also Figure S2. 



the vast majority of tumor cells in the THZ1 -treated animals 
demonstrated necrosis, reduced proliferative activity and 
increased apoptosis. To ensure target engagement in the tumor 
cells, we used bio-THZI to pull down CDK7 in cell lysates from 
both vehicle- and THZ1 -treated animals, noting decreased bind- 
ing with bio-THZI in the latter (Figure S1H). We also confirmed 
that the lack of toxicity in the animal models did not reflect 
THZ1 selectivity for human CDK7, as bio-THZI formed a com- 
plex with murine CDK7 in cell lysates from NIH STS cells treated 
with THZ1 (Figure SI H). These results demonstrate the feasibility 
of specifically targeting CDK7 in tumor cells driven by deregu- 
lated MYC or MYCN. 

THZ1 Affects Both Transcription and Cell-Cycle 
Progression in MYCAZ-Amplified Cells 

As a transcriptional kinase, CDK7 exerts its effects through regu- 
lation of RNA Pol ll-mediated transcriptional initiation and pause 
establishment, while also affecting elongation through its CAK 
activity on other transcriptional CDKs (Glover-Cutter et al., 
2009; Larochelle et al., 2012; Palancade and Bensaude, 2003). 
We observed a dose-dependent decrease in the initiation-asso- 
ciated serine 5 (S5) and serine 7 (S7) and the elongation-associ- 
ated serine 2 (S2) Pol II phosphorylation in /V/YCA/-amplified but 
not nonamplified cells treated with THZ1 (Figures 2A and S2A). 



Pol II CTD phosphorylation was also downregulated in tumor 
cells from animals treated with THZ1 (Figure 2B). Downregula- 
tion of CDK7 phosphorylation was seen in MYCAZ-amplified cells 
(Figure S2B), consistent with the finding that CDK7 is regulated 
by phosphorylation within its own activation (T) loop (Larochelle 
et al., 2012). Phosphorylation of CDK9 was also decreased in 
MYCAZ-amplified cells (Figure S2C), reinforcing the effect of 
THZ1 on transcription elongation. Total protein levels of CDK9 
were also decreased in these cells but not nonamplified cells, 
suggesting that THZ1 -induced CDK7 inhibition might also target 
the transcription of CDK9 (Figure S2C). 

The decrease in Pol II phosphorylation after THZ1 treatment in 
MYCAZ-amplified cells coincided with the loss of the short-lived 
antiapoptotic protein MCL1 (Figure 2C). Similar effects on Pol II 
CTD phosphorylation and MCL1 levels were observed in Raji, 
Daudi, and H262-BTIII cells, all of which depend on MYC or 
MYCN overexpression and are sensitive to THZ1 (Figure S2D). 
Importantly, the effects of THZ1 , including Pol II CTD phosphor- 
ylation and the induction of apoptosis, could be rescued by a 
mutation in the covalent binding site of CDK7 (CDK7^^^^^) (Fig- 
ure S2E), indicating on-target effects of THZ1 on CDK7. 

CDK7 also stimulates cell-cycle progression by activating 
CDK1 and CDK2 through its T-loop phosphorylation function 
(Larochelle et al., 2007). We observed a time-dependent 



Cell 159, 1 1 26-1 1 39, November 20, 201 4 ©201 4 Elsevier Inc. 11 29 




Cell 



A 

MyCAZ-amplified MYCN non-am p 
Kelly IMR-32 SH-SY5Y 




B 



THZ1 (100 nM) 



p< 10’5 




“t I 

p< 10^" 



Kelly IMR32 SH-SY5Y 



^ Kelly IMR-32 SH-SY5Y 




CDK7 

CDK17 

CDK13 

CDK8 

CDK12 

CDK9 

CDK1 

CDK4 

CDK20 

CDK2 

CDK19 

CDK11B 

CDK11A 

CDK6 



-2 0 2 




0)0 
c o 
TO rL 

OQ 
2 C 




log2 fold change 
THZ1 (100 nM) 



Figure 3. THZ1 Causes Massive Downregulation of Actively Transcribed Genes in MYCN-Overexjpressmg NB Cells 

(A) Heatmap of gene expression values in MYCA/-ampiified and nonampiified ceiis treated with THZ1 (100 nM for 6 hr) versus DMSO. Rows show Z scores 
caiculated for each ceil type. 

(B) Quartile box plots of log2 fold changes in gene expression in MYCAZ-ampiified and nonampiified ceiis treated with DMSO or THZ1 at the same dose and 
duration as in (A). Box plot whiskers extend to 1 .5 times the interquartiie range (n = 18,665 expressed genes, p < 10“^^ for Keiiy versus SH-SY5Y and IMR-32 
versus SH-SY5Y, two-sided Mann-Whitney U test). 

(C) Venn diagram depicting the overiap between sets of differentiaiiy expressed transcripts (THZ1 versus DMSO) in MYCAZ-ampiified (Keiiy, iMR-32), and 
nonampiified (SH-SY5Y) ceiis treated with THZ1 as in (A). Red represents upreguiated transcripts. Biue represents downreguiated transcripts. 

(D) Heatmap of gene expression vaiues of transcriptionai and ceii-cycie CDKs in MYCAZ-ampiified and nonampiified ceiis treated with THZ1 as in (A) versus 
DMSO. Rows showZ scores caiculated for each cell type. 

(E) Correlation between iog2 foid changes in gene expression foiiowing THZ1 (1 00 nM) versus DMSO treatment and actinomycin D (1 |iM) versus DMSO treatment 
for 6 hr. in MYCAZ-ampiified NB ceiis. R^ (coefficient of determination) caicuiated using a simpie iinear regression modei. 

See aiso Figure S3. 



decrease in CDK1 and CDK2 phosphorylation and in other pro- 
teins involved in cell-cycle regulation (pRB and E2F), that was 
more striking in /V/yCAZ-amplified cells (Figures 2D and S2F). How- 
ever, the Pol II inhibitor triptolide appeared to show selectivity for 
/V/yCA/-overexpressing cells, compared to purvanolol, which pri- 
marily targets cell-cycle CDKs (Figure S2G). Thus, although THZ1 
causes cytotoxicity through transcriptional as well as cell-cycle 
inhibition in /V/yC-driven tumor cells, the latter effect is most likely 
through decreased transcription of cell-cycle CDKs. 

CDK7 Inhibition Causes Widespread Transcriptional 
Shutdown in MYCN-Amjp\\i\e6 NB Cells 

Because THZ1 preferentially downregulates RNA Pol II CTD 
phosphorylation in /V/yCA/-amplified cells (Figures 2A and S2A), 



we next compared the consequences of CDK7 inhibition on 
global gene expression in /V/yCA/-amplified (Kelly and IMR-32) 
versus nonampiified (SH-SY5Y) cells following exposure to 
100 nM THZ1 for 6 hr. Expression profiling was performed with 
“spike-in” RNA standards normalized to cell number to enable 
accurate detection of differences in total RNA levels in cells 
with or without MYCN overexpression (Loven et al., 2012). 

We observed a genome-wide decrease in the mRNA expres- 
sion levels of most actively transcribed genes in /V/YCAZ-amplified 
cell lines compared to DMSO-treated cells (Figure 3A). This wide- 
spread downregulation of gene expression was not apparent in 
/V/yCA/-nonamplified SH-SY5Y cells treated with the same 
dose and duration of THZ1 (Figure 3A). Indeed, the percentage 
of actively transcribed genes significantly downreguiated in 
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these cells was only 7.5%, contrasting with 63% for Kelly and 
68% for IMR-32 /V/yCA/-amplified cells (false discovery rate 
[FDR] <0.05) (Figure 3B). Moreover, Kelly and IMR-32 cells had 
55% (n = 10,292) downregulated transcripts in common, while 
only 12% (n = 1,225) of these were also downregulated in SH- 
SY5Y cells (out of 18,665 genes) (Figure 3C). The most downre- 
gulated genes and gene sets in the /V/YCA/-amplified cells were 
transcriptional and cell-cycle regulators (Figures S3A and S3B); 
including transcripts of CDKs and their partner cyclins (Figures 
3D and S3C). Comparison of expression profiles of /V/YCA/-ampli- 
fied cells exposed to THZ1 with those exposed to the general 
transcription inhibitor actinomycin D revealed a significant corre- 
lation between the two (R^ = 0.76) (Figure 3E). These results led 
us to conclude that CDK7 inhibition, through THZ1 , preferentially 
downregulates global transcription in /V/YCA/-amplified NB cells, 
affecting most active genes. 

THZ1 Targets Deregulated MYCN in NB Cells 

The selectivity of THZ1 for MYC/MYCN-deregulated cells led 
us to investigate its effects on MYCN expression and function. 
We observed that MYCN mRNA was among the top 15% of 
significantly downregulated transcripts in NB cell lines most 
sensitive to THZ1 . Moreover, transcripts differentially regulated 
on exposure to THZ1 showed enrichment for previously pub- 
lished MYCN/MYC target gene sets (Figure S4A). THZ1 treat- 
ment resulted in a dose- and time-dependent downregulation 
of MYCN mRNA and protein levels in /V/YCA/-amplified cells 
(Figures 4A and 4B), which were rescued by overexpression 
of the CDK7^^^^^ mutant (Figure S4B), similar to observations 
with an ATP analog in colorectal carcinoma cells expressing 
a CDK7 mutation (Glover-Cutter et al., 2009). We also observed 
downregulation of MYCN protein levels in the human NB tu- 
mors from mice that responded to THZ1 (Figure 4C). These 
effects primarily reflected a reduction in MYCN transcripts, as 
comparison of MYCN protein turnover in /V/YCA/-amplified cells 
exposed to either THZ1 or actinomycin D showed largely 
similar rates of degradation (Figure S4C). Finally, analysis of 
MYCN occupancy at the promoters of two of its known tran- 
scriptional targets, MDM2 (Slack et al., 2005) and MCL1 (La- 
bisso et al., 2012), revealed significantly decreased MYCN 
binding, consistent with the disruption of MYCN function by 
THZ1 (Figure 4D). 

Next, we determined whether genetic depletion of MYCN 
mimicked the effects of THZ1 in /V/YCA/-amplified cells. Abroga- 
tion of amplified MYCN expression in Kelly cells using shRNA 
knockdown led to decreased Pol II CTD phosphorylation at S2, 
S5, and S7 (Figure 4E), similar to our findings with THZ1 treat- 
ment. The inhibition of transcription associated with MYCN 
knockdown was coupled with complete loss of MCL1 and sub- 
sequent cell death (Figure 4E). We next asked whether ectopic 
expression of MYCN would sensitize cells to THZ1. MYCN- 
nonamplified SH-SY5Y cells were engineered to express the 
oncogene by lentiviral transduction, resulting in approximately 
380-fold increases in MYCN mRNA and protein levels (Figure 4F), 
equivalent to those in /V/YCA/-amplified cells (Figure SID). 
Ectopic expression of MYCN in these cells led to a 5-fold greater 
sensitivity to THZ1 compared with untransfected or vector con- 
trol-transfected SH-SY5Y cells (Figure 4G). Together, these 



results indicate that THZ1 specifically targets deregulated 
MYCN in NB cells. 

Amplified MYCN Induces Global Transcriptional 
Amplification that Is Inhibited by THZ1 

Downregulation of MYCN alone seemed insufficient to account 
for the widespread effects of THZ1 on the transcriptional output 
of cells harboring amplified MYCN. We therefore questioned 
whether deregulated MYCN, like MYC, binds promiscuously to 
active genes, leading to global transcriptional amplification, 
and whether THZ1 preferentially disrupts this effect. Indeed, 
ectopic expression of MYCN in SH-SY5Y cells led to increased 
cell size (Figure S5A) and significantly increased expression of 
all active transcripts (Figures 5A and S5B). Moreover, THZ1 
treatment led to significant downregulation of transcriptional 
output in these cells (Figure 5B) with induction of apoptosis, 
although MYCN protein levels were not affected (Figure 5C). 
Furthermore, shRNA knockdown of MYCN in MYCAZ-amplified 
cells led to a striking downregulation of global gene expression 
in comparison to their control shRNA-expressing counterparts 
(Figures 5D and 5E), again underscoring the widespread tran- 
scriptional effects of deregulated MYCN. To determine the 
extent to which these effects mirrored those of CDK7 inhibition, 
we compared the expression profiles of /V/YCA/-amplified cells 
expressing a MYCN shRNA with those of THZ1 -treated cells, 
noting significant downregulation of the actively transcribed 
genes in both types of cells, with a close correlation between 
the top 500 downregulated transcripts in THZ1 -treated cells 
and those in MYCN shRNA-depleted cells (Figures 5F and 
S5C). Hence, disruption of MYCAZ-induced transcriptional ampli- 
fication also contributes to the striking effects of THZ1 . 

Super-Enhancers Account for the Selectivity of THZ1 
for MYCyV-Amplified NB Cells 

We observed that the ectopic overexpression of MYCN in nonam- 
plified cells led to global transcriptional upregulation and sensiti- 
zation to the growth inhibitory effects of THZ1 , although the latter 
effect did not approach the levels seen in cells with endogenously 
amplified MYCN. We therefore investigated the possible contri- 
bution of super-enhancers (SEs) to the extreme susceptibility of 
endogenous /V/YCA/-amplified cells to THZ1. Using chromatin 
immunoprecipitation with high-throughput sequencing (ChlP- 
seq) of histone H3K27 acetylation (H3K27ac), a mark of active 
enhancers (Hnisz et al., 201 3), we first identified enhancer regions 
in MYCAZ-amplified Kelly and nonamplified SH-SY5Y cells (Fig- 
ure S6A). In both cell types, a subset of H3K27ac-bound en- 
hancers (Kelly 6.1%, SH-SY5Y, 5.4%) had a significantly higher 
signal than regular enhancers and were therefore classified 
as SEs (Loven et al., 2013; Whyte et al., 2013) (Figure 6A). We 
also analyzed the occupancy of H3K4 monomethylation, a mark 
that colocalizes with H3K27ac in strong enhancers, noting a 
high concordance (94%-99%) with enhancers identified by 
H3K27ac analysis (Figures 6B and S6B). 

The SEs in /V/YCAZ-amplified cells were either unique or dis- 
proportionately loaded with H3K27ac, with the largest SE being 
associated with the MYCN oncogene itself, having an H3K27Ac 
signal that was at least ten times greater than that of any other SE 
in either these or /V/YCA/-nonamplified cells (Figures 6A and 6B). 
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Figure 4. Cytotoxic Effects of THZ1 Are Mediated in Part through Inhibition of MYCN Expression 

(A) Quantitative RT-PCR (qRT-PCR) anaiysis oiMYCN RNA expression in MV'CA/-ampiified Keiiy ceiis treated with THZ1 50 nM for 0-6 hr (ieft) or 0-250 nM for 6 hr 
(right). Data normaiized to p-actin are presented as mean ± SD of three bioiogicai repiicates. 

(B) immunoblot anaiysis of MYCN protein expression in MYCAZ-ampiified NB ceiis treated with the indicated doses of THZ1 for 6 or 24 hr. 

(C) iHC anaiysis of MYCN protein expression in MYCAZ-ampiified human NB xenograft modeis treated with either DMSO or 10 mg/kg i.v. twice daiiy of THZ1 for 
12 days. Scaie bars represent 25 i^M. 

(D) ChIP-qPCR anaiysis of MYCN binding at the promoters of candidate target genes in MYCAZ-ampiified ceiis foiiowing treatment with THZ1 , 100 nM for 3 hr. 
Mean ± SD vaiues for three repiicate experiments are shown. **p < 0.01 (Student’s t test). 

(E) immunobiot anaiysis of the indicated proteins in MYCAZ-ampiified NB ceiis expressing either an shRNA controi (shGFP) or an shRNA directed against MYCN 
(shMYCN). Two different hairpins against MYCN [shMYCN(l) and shMYCN(3)] were used with simiiar resuits. 

(F) qRT -PCR anaiysis of MYCN expression in SH-SY5Y MYCAZ-nonampiified ceiis engineered to express either a controi vector (pLenti 6.3) or MYCN (pLenti 6.3- 
MYCN (ieft). Immunobiot analysis of MYCN protein expression in these cells (right). 

(G) Cell viability analysis of untransfected (SH-SY5Y), control vector-expressing (pLenti 6.3) or MYCN-expressing (pLenti 6.3-MYCN) MYCAZ-nonamplified NB 
cells treated for 72 hr with increasing doses of THZ1 . Results are means ± SD of three replicates. 

See also Figure S4. 



Because the MYCN SE in Kelly cells is contained within the 
MYCN amplicon, we questioned whether its SE status merely re- 
flected the increased number of copies of the amplified gene. To 
investigate this possibility, we estimated the signal of a single 



copy of MYCN and recalculated its rank within the list of en- 
hancers. A single copy of the MYCN enhancer still fell within 
the group of SEs (ranked 250 out of 746), indicating that 
increased gene copy number does not account for the high SE 
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Figures. THZ1 Inhibits MVCA^-Driven Transcriptional Amplification 

(A) Box plot depicting the log2 fold changes in actively transcribed genes in 
MyCAZ-nonamplified SH-SY5Y cells transduced with MYCN (pLenti 6.3- 
MYCN) compared with a control vector (pLenti 6.3). Box plot whiskers extend 
to 1.5 times the interquartile range (n = 18,665 expressed genes; p < 2.2 x 
10“^®, two-sided Mann-Whitney U test). 



signal associated with MYCN. The majority of genes associated 
with the remaining top-ranked SEs were master transcription 
factors critical to sympathetic neuronal development and cell 
identity including PHOX2B (Pattyn et al., 1999; Stanke et al., 

1999) , GATA2 (Tsarovina et al., 2004), HAND2 (Howard et al., 

2000) , and DBH (Mercer et al., 1991) (Figures 6A and 6C). 
Another major oncogenic driver in NB, the receptor tyrosine ki- 
nase ALK (George et al., 2008), was associated with a unique 
top-ranked SE in /V/YCAZ-amplified cells (Figures 6A and 6C). In 
general, the SE domains associated with MYCN amplification 
correlated with pathways involved in the regulation of transcrip- 
tion, while those in nonamplified cells were primarily related to 
cell adhesion, invasion, metastasis and receptor tyrosine kinase 
pathways (Figures 6D and S6C). To ensure that the SEs in the 
cultured cells mirrored those of human NB, we analyzed 
H3K27ac occupancy in three primary tumors (Figures 6E and 
S6D). Consistent with our cell line data, the largest SE in 
/V/yCA/-amplified samples NB#1 and NB#2 was associated 
with MYCN (Figures 6E and S6E). SEs associated with PHOX2B 
and HAND2 were identified in all tumor samples (Figure 6E), in 
keeping with their lineage-specific role in sympathetic neuronal 
development. 

To determine whether the occurrence of SEs correlates with 
increased Pol II occupancy, we used ChIP-seq analysis of Pol 
II and observed that the proximal promoter regions of SE-asso- 
ciated genes had higher levels of Pol II occupancy compared 
with regular enhancers in both /V/yCA/-amplified and nonampli- 
fied cells, although the difference was much more striking in 
the former (Figure 7A). THZ1 treatment led to a striking reduction 
in Pol II binding at the promoter regions and gene bodies of SE- 
associated genes in /V/YCA/-amplified but not nonamplified cells 
(Figure 7A). Pol II binding at regular enhancer-associated genes 
was not appreciably affected by THZ1 in either cell type (Fig- 
ure 7A). The reduction in Pol II occupancy with THZ1 was 
observed at the transcription start and end sites of genes asso- 
ciated with the top-ranked SEs, including MYCN, suggesting 
that their expression was severely curtailed in these cells (Fig- 
ure 7B). Indeed, RNA expression of SE-associated genes was 
significantly reduced after THZ1 treatment in MYCAZ-amplified 
cells (Figure 7C) but not of the same genes in /V/YCAZ-nonampli- 
fied cells (Figure 7C). Thus, not only do SEs in /V/YCAZ-amplified 



(B) Heatmap of all gene expression values in MYCAZ-nonamplified SH-SY5Y 
cells versus SH-SY5Y cells overexpressing MYCN (pLenti 6.3-MYCN) 
following either THZ1 (100 nM for 6 hr) or DMSO treatment. Rows show Z 
scores calculated for each vector. 

(C) Immunoblot analysis of MYCN, CC3, and cleaved PARP in SH-SY5Y 
MYCAZ-nonamplified cells engineered to express either a control vector (pLenti 
6.3) or MYCN (pLenti 6.3-MYCN) following treatment with THZ1 . 

(D) Heatmap of differentially expressed genes in MYCAZ-amplified NB cells 
expressing a MYCN shRNA versus control shRNA. Rows show Z scores 
calculated for each vector. 

(E) Box plots of log2 fold changes in gene expression in MYCAZ-amplified NB 
cells expressing a MYCN shRNA versus a control shRNA. Box plot whiskers 
extend to 1.5 times the interquartile range (n = 18,665 expressed genes; p < 
10“''^ for shMYCN versus shGFP, two-sided Mann-Whitney U test). 

(F) GSEA plot depicting the correlation between the top 500 downregulated 
genes following THZ1 treatment and the rank-ordered genes that are differ- 
entially expressed after MYCN knockdown in MYCAZ-amplified NB cells. 

See also Figure S5. 
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Figure 6. NB Cells Possess Unique Super-Enhancer Landscapes 

(A) H3K27ac signal across enhancer regions for all enhancers in MYCAZ-amplified and nonamplified cells. SEs were defined as enhancers surpassing the 
threshold signal of 8,802 in both cell types. In Kelly and SH-SY5Y cells, 6.1% (746/12,200) and 5.4% (1,136/20,887) of the enhancers comprised 52% and 37%, 
respectively of all H3K27ac-bound enhancer signal and were classed as SEs. 

(B) ChIP-seq profiles for H3K27ac, H3K4me1 , and RNA Pol II binding at the MYCN SE gene locus in Kelly cells. The x axis shows genomic position and the y axis 
the signal of histone mark or Pol 1 1 binding in units of reads per million per base pair (rpm/bp). The gene model is depicted below and scale bars above the binding 
profiles. 

(C) ChIP-seq profiles for H3K27ac binding at representative SE-associated gene loci in MYCAZ-amplified and nonamplified cells. The x and y axes are as 
described in (B). 

(D) Gene Ontology (GO) molecular functions of SE-associated genes in MYCAZ-amplified (Kelly) and nonamplified (SH-SY5Y) cells identified using GREAT analysis 
(McLean et al., 2010). 

(E) H3K27ac loading across enhancers in MYCAZ-amplified and nonamplified primary NB tumors. SEs were defined as having a threshold signal of 2,427 (NB#1), 
5,933 (NB#2), and 5,952 (NB#3). In NB#1 and NB#2, 8.5% (164/1,920) and 3.0% (109/3,561), and in NB#3, 6.4% (522/8,040) of the enhancers were classified 
as SEs. 

See also Figure S6. 
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Figure 7. Sensitivity of MyCAf-Amplified Cells to THZ1 Correlates with MyCAf-Associated Super-Enhancers 

(A) Metagene representation of global Pol II ChIP-seq occupancy at SE- and RE-associated genes in cells with (Kelly) or without (SH-SY5Y) MYCN amplification 
treated with DMSO (blue) or THZ1 (red) (100 nM x 3 hr). TSS, transcription start sites. 

(B) Gene tracks of RNA Pol II binding density at representative SE-associated gene loci after DMSO or THZ1 treatment as in (A). 

(C) Left: quartile box plots of log2 fold changes in the top 231 genes associated with SEs and regular enhancers (RE) in MyCAZ-amplified cells treated with THZ1 
(100 nM X 6 hr) versus DMSO. Box plot whiskers extend to 1 .5 times the interquartile range (n = 230 SE; n = 231 RE; p < 10“^, two-sided Mann-Whitney U test). 
Right: quartile box plots of log2 fold changes in gene transcripts associated with the top-ranked SEs unique to MYCAZ-amplified Kelly cells treated with THZ1 as in 
(A) versus DMSO, compared with the expression changes of the same genes in similarly treated SH-SY5Y nonamplified cells. Box plot whiskers extend to 1 .5 
times the interquartile range (n = 673; p < 10“^®, two-sided Mann-Whitney U test). 

(D) Proposed mechanism for the action of THZ1 in MYCAZ-amplified NB. Oncogenic MYCN is regulated by super-enhancers leading to its high-level expression 
(left). Overexpressed MYCN invades the promoter and enhancer regions of all active genes, including itself, to induce global transcriptional upregulation (right). 
THZ1 targets the expression of both MYCN and MYCAZ-driven transcriptional amplification. 
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cells cause high-level expression of oncogenic drivers and 
genes that determine cell identity, but they also provide the basis 
for the selectivity of THZ1 . 

DISCUSSION 

Here, we demonstrate that the genomically amplified MYCN 
oncogene, by promoting the development of SEs, causes upre- 
gulation of the active transcriptional program of NB cells, sensi- 
tizing them to inhibition of CDK7, a widely expressed regulator of 
transcription and cell-cycle transition. This effect suppresses 
essentially all active transcripts, especially those responsible 
for the uncontrolled proliferation that characterizes the cancer 
cell state, leading to potent and selective cytotoxicity while 
sparing normal cells. The high potency of our CDK7 inhibitor, 
THZ1 , can be explained in part by its unprecedented mode of 
binding —the formation of a covalent bond with a unique cysteine 
residue outside the kinase domain — resulting in prolonged and 
irreversible CDK7 inactivation (Kwiatkowski et al., 2014). This 
feature contrasts with other transcriptional CDK inhibitors, 
including flavopiridol, roscovitine and CR-8 (Table SI), all of 
which rely on ATP-competitive modes of binding and whose ef- 
fects on transcription would therefore be expected to be tran- 
sient. The selectivity of THZ1 for /V/yCA/-amplified cells can be 
attributed to the perturbation of SEs associated with deregulated 
MYCN and other oncogenic drivers. 

Despite risk-based treatment strategies for MYCAZ-amplified 
NB, responses tend to be brief, with early relapses that are 
almost uniformly fatal (Matthay et al., 1999). Finding suitable 
ways to inhibit MYCN, or any MYC family member for that 
matter, has been notoriously difficult. A recent approach is the 
modulation of bromodomain and extraterminal (BET) domain 
coactivator proteins, such as BRD4, which associate with acet- 
ylated chromatin and promote transcriptional activation (Del- 
more et al., 201 1 ; Loven et al., 201 3; Puissant et al., 201 3). After 
successful testing of the BRD4 inhibitor JQ1 in MYC-dependent 
multiple myeloma models (Delmore et al., 2011), a positive cor- 
relation between MYCN amplification and sensitivity to JQ1 
was demonstrated in the majority of /V/YCA/-amplified NB models 
tested (Puissant et al., 2013). However, others reported growth 
inhibition of NB using the structurally distinct BRD4 inhibitor 
I-BET726, irrespective of MYCN amplification status or MYC/ 
MYCN expression levels (Wyce et al., 2013). The intervention 
we describe has a number of features that would support the 
clinical development of covalent CDK7 inhibitors for use against 
MYCAZ-amplified NB and other MYC-driven cancers. These 
include the ability to achieve potent growth inhibition with rela- 
tively low doses of THZ1 , the lack of off-target effects, and the 
absence of discernible toxicity in a mouse model of NB. THZ1 
has been demonstrated to cross-react with CDK12/13 (Kwiat- 
kowski et al., 2014); however, shRNA knockdown of these 
kinases in MYCAZ-amplified cells did not affect MYCN levels (Fig- 
ure SI B). Furthermore, overexpression of CDK7^^^^^ restored 
MYCN levels to near amplified levels (Figure S4B). These results 
would suggest that loss of CDK7 activity dominates the selective 
effects of THZ1 in /V/YCA/-amplified NB cells. 

Multiple studies argue that MYC is a global transcriptional 
amplifier (Lin et al., 2012; Loven et al., 2012; Nie et al., 2012; 



Schuhmacher and Eick, 201 3), although a recent report has sug- 
gested that a portion of this amplification is indirect (Sabo et al., 
2014). We believe the transcriptional amplification effects of 
MYC are direct at the vast majority of active genes and that dif- 
ferences in interpretation of direct versus indirect effects are due 
to differences in interpretation of ChIP-seq data thresholds. We 
demonstrate here that deregulated MYCN functions as a tran- 
scriptional amplifier of the cell’s existing gene expression pro- 
gram rather than a specific target gene set. This effect was not 
detectable in earlier data sets of /V/YCA/-amplified tumor samples 
because of a limitation of microarray analysis, which in conven- 
tional approaches assumes similar levels of total RNA in all the 
samples (Loven et al., 2012). 

Although in principle, blockade of CDK7 function would be ex- 
pected to inhibit the expression of any gene poised to undergo 
transcription, our direct demonstration that manipulation of 
MYCN levels can modulate the sensitivity to THZ1 supports 
the idea that MYC-induced transcriptional amplification is crucial 
to THZ1 activity. This transcriptional dependency was illustrated 
by the massive transcriptional shutdown in /V/YCA/-amplified 
cells with relatively low doses of THZ1 (Figure 3). By contrast, 
although CDK transcripts were downregulated by THZ1 in 
/V/YCA/-nonamplified cells, this effect did not produce the same 
profound impact seen in /V/YCA/-amplified cells, providing further 
evidence that aberrant MYCN expression and the resultant 
global amplification of transcription are necessary for cell 
sensitivity to THZ1 . This critical dependency on transcriptional 
amplification would account for the selective killing of MYCN- 
amplified cells by THZ1 without inducing toxic side effects in 
mice. The unique SE landscapes of MYCAZ-amplified cells also 
dictate their response to THZ1 . Indeed, genes involved in the 
oncogenic state— such as MYCN and ALK, and those that 
specify sympathetic neuronal cell identity including PHOX2B 
and /-/AA/D2— were associated with SEs and were far more sus- 
ceptible to THZ1 than either their counterparts in MYCAZ-nonam- 
plified cells or genes driven by regular enhancers. This selectivity 
of THZ1 for /V/YCA/-amplified cells could stem from its inhibition 
of the increased Pol II activity at SE-associated genes, leading to 
downregulation of MYCN expression as well as that of the entire 
transcriptional program of the cell. 

Our data suggest that the cytotoxicity induced by THZ1 in 
/V/YCA/-driven NB cells is mediated by inhibition of both MYCN 
expression and /V/YCAZ-stimulated global transcriptional ampli- 
fication (Figure 7D). Oncogenic MYCN invades the core pro- 
moters and enhancer regions of actively transcribed genes, 
resulting in the formation of SEs at key genes underlying the can- 
cer cell state, including MYCN, and ultimately in transcriptional 
amplification. THZ1 , by irreversibly inhibiting CDK7, blocks tran- 
scription of MYCN (Figure 4) as well as MYCN-\n6uce6 amplifi- 
cation of global gene transcription. The latter conclusion is 
based on the widespread inhibition of transcription in MYCN- 
amplified cells (Figure 3), the highly significant correlation be- 
tween the transcripts that were downregulated by THZ1 and 
those resulting from genetic depletion of MYCN, and the fact 
that ectopic expression of MYCN in non-/V/YCA/-overexpressing 
cells sensitizes these cells to widespread transcriptional inhibi- 
tion by THZ1 (Figures 4 and 5). Together, these two processes 
effectively deprive tumor cells of the proliferative signals to which 
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they have become addicted, leading to apoptotic death. Never- 
theless, this mechanism is not necessarily specific to deregu- 
lated MYCN or MYC. Rather, we suggest that inhibition of 
CDK7 or perhaps other transcriptional CDKs would be effective 
in any context in which tumor cells rely on high-level expression 
of one or more oncogenic transcription factors for their aberrant 
growth and survival. However, the net inhibitory effect is likely to 
be much more profound in cancers that depend on MYC (or 
MYCN) as the oncogenic driver. 

In conclusion, we use NB and a newly developed transcrip- 
tional CDK inhibitor to demonstrate the potency and selectivity 
of a potential treatment strategy that targets global transcrip- 
tional amplification in MYC-driven tumors. Our results suggest 
that the mechanism by which oncogenic MYC gains control 
over myriad cellular processes to induce tumor formation also 
exposes a vulnerability that can be exploited therapeutically. 
By using THZ1 to disable essential components of the cancer 
cells’ transcriptional machinery, we show that widespread sup- 
pression of transcription in MYC-dependent cancers is not 
only feasible, but also spares normal cells from toxicity. 

EXPERIMENTAL PROCEDURES 
Cell Lines 

Human NB cell lines (from the Children’s Oncology Group) and Raji and Daudi 
cells (provided by D. Sabatini, Massachusetts Institute of Technology) were 
grown in RPMI-1640, 10% fetal bovine serum (FBS), and 1% penicillin/strep- 
tomycin (pen/strep; Invitrogen). H262-BT111 primitive neuroectodermal tumor 
cells (provided by K. Ligon, Dana-Farber Cancer Institute [DFCI]) were grown in 
neuronal stem cell (NSC) media with heparin, NSC supplement (Stem Cell 
Technologies), 1% pen/strep, erythrocyte growth factor (EMD Millipore), and 
fibroblast growth factor (Life Technologies). NIH 3T3 cells (purchased from 
ATCC) and B6-MEFs (provided by A. Kung, Columbia University) were grown 
in DMEM with 10% FBS and 1% pen/strep. 

Animal Studies 

These experiments were performed with approval from the Institutional Animal 
Care and Use Committee of the DFCI. Full details are given in Extended Exper- 
imental Procedures. 

Synthetic RNA Spike-In and Microarray Analysis 

RNA samples were prepared as previously described (Loven etal., 2012). Total 
RNA was spiked-in with RNA Spike-In Mix (Ambion), treated with DNA-free 
DNase I (Ambion), analyzed on Agilent 21 00 Bioanalyzer (Agilent Technologies) 
for integrity, and hybridized to Affymetrix GeneChip PrimeView Human Gene 
Expression arrays (Affymetrix). Data analysis is described in Extended Exper- 
imental Procedures. 

ChIP-Seq 

Chip was carried out as previously described (Lee et al., 2006) with minor 
changes described in the Extended Experimental Procedures. The following 
antibodies were used: RNAPII (Santa Cruz, sc-899), H3K4me1 (Abeam, 
ab8895), and H3K27ac (Abeam, ab4729). 

Data Analysis 

All ChIP-seq data were aligned using the short-read aligner Bowtie (version 
0.12.7) (Langmead et al., 2009) to build version GRCh37/HG19 of the human 
genome. To visualize ChIP-seq tracks, reads were extended by 160 bases, 
converted into tdf files using igvtools (version 2.2.1) and visualized in IGV (Rob- 
inson et al., 2011). ChIP-seq peaks were detected using a peak-finding algo- 
rithm, MACS version 1.4.2 (Zhang et al., 2008) with the default p value 
threshold of enrichment of 1 x 10“^ used for all data sets. Active enhancers, 
ranked according to the magnitude of the H3K27ac signal, were defined as re- 



gions of ChIP-seq enrichment for H3K27ac and H3K4me1 outside of pro- 
moters. To identify super-enhancers, we employed the ROSE algorithm 
(https://bitbucket.org/young_computation/rose) (Loven et al., 2013; Whyte 
et al., 2013) to rank the enhancers that were identified using MACS in which 
peaks within 12.5 kb of one another were stitched together and these stitched 
enhancers were ranked by their difference in H3K27ac signal versus input 
signal. Further details are given in the Extended Experimental Procedures. 

ACCESSION NUMBERS 

The data discussed in this publication have been deposited in NCBI’s Gene 
Expression Omnibus and are accessible through GEO series accession num- 
ber GSE62726. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, six 
figures, and one table and can be found with this article online at http://dx. 
doi.org/1 0.101 6/j.cell.201 4.1 0.024. 
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SUMMARY 

Mammalian transcriptomes display complex circa- 
dian rhythms with multiple phases of gene expres- 
sion that cannot be accounted for by current models 
of the molecular clock. We have determined the un- 
derlying mechanisms by measuring nascent RNA 
transcription around the clock in mouse liver. Unbi- 
ased examination of enhancer RNAs (eRNAs) that 
cluster in specific circadian phases identified func- 
tional enhancers driven by distinct transcription fac- 
tors (TFs). We further identify on a global scale the 
components of the TF cistromes that function to 
orchestrate circadian gene expression. Integrated 
genomic analyses also revealed mechanisms by 
which a single circadian factor controls opposing 
transcriptional phases. These findings shed light on 
the diversity and specificity of TF function in the gen- 
eration of multiple phases of circadian gene tran- 
scription in a mammalian organ. 

INTRODUCTION 

A substantial proportion of mammalian genes are expressed 
with a circadian rhythm driven by a cell autonomous molecular 
clock (Hughes et al., 2009; Miller et al., 2007; Panda et al., 
2002). The clock mechanism involves a network of transcrip- 
tional-translational feedback loops comprised of core transcrip- 
tional activators BMAL1 /CLOCK and two sets of repressors, 
PER/CRY (Reppert and Weaver, 2001; Takahashi et al., 2008) 
and Rev-erbs a and p (Bugge et al., 2012; Cho et al., 2012; Rip- 
perger and Schibler, 2001). Under normal conditions, each 
cellular clock is synchronized by systemic cues and generates 
multiple phases of rhythmic output (Asher and Schibler, 201 1 ; 
Dibner et al., 2010; Peek et al., 2012). 

Although each circadian transcription factor (TF) binds DMA 
with genome-wide oscillation peaking at a specific time (Feng 
et al., 2011; Koike et al., 2012; Rey et al., 2011), binding of 
an individual circadian TF, e.g., BMAL1, has been reported at 
genes oscillating with a range of phases, many of which do 
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not correlate with the circadian regulator’s binding phase (Me- 
net et al., 2012). Moreover, genome-wide studies have re- 
vealed a substantial portion of circadian TF binding tens to 
hundreds of kilobases away from known transcription start 
sites (TSS) (Feng et al., 2011; Koike et al., 2012; Rey et al., 
2011) and a high degree of overlap between core clock TFs 
with competing effects on circadian rhythms, such as BMAL1 
and Rev-erba (Cho et al., 2012; Koike et al., 2012). Further- 
more, several clock output TFs have been suggested to 
generate transcriptional rhythms with delayed phase relative 
to BMAL1 /CLOCK, but these mechanisms have not been 
explored genome-wide (Asher and Schibler, 2011). Thus, a 
fundamental question remains as to how the interaction of mul- 
tiple regulators at the genome, particularly at distal enhancer 
elements, produces distinct phases of circadian transcriptional 
activity. 

Here, we applied Global Run-On sequencing (GRO-seq) (Core 
et al., 2008; Wang et al., 2011) to mouse livers collected at mul- 
tiple times of day to measure the circadian activity of enhancer 
regions based on enhancer RNA (eRNA) transcription (Hah 
et al., 2013; Kim et al., 2010). We identified thousands of oscil- 
lating enhancers with varying peak activity times, and in partic- 
ular, we found that specific phases of oscillation are associated 
with distinct regulatory motifs and TF binding patterns. Our data 
suggest that specific phases of enhancer activity in vivo are 
achieved by a dominant regulator at each site, determined in 
part by sequence content, in contrast to combinatorial regula- 
tion models based primarily on synthetic in vitro models (Ukai- 
Tadenuma et al., 2008). Furthermore, we show that eRNA 
oscillations are highly predictive of the rhythmicity and phase 
of transcription at nearby genes, demonstrating a large-scale 
and previously unexplored role for distal regulatory elements in 
the generation of transcriptional rhythms. By combining circa- 
dian enhancer maps, transcription factor cistromes, and genetic 
ablation of Rev-erba and Clock, we demonstrate that circadian 
eRNAs can be used to both identify the TFs coordinating specific 
phases of gene transcription and, importantly, uniquely distin- 
guish the functional binding sites within a circadian TF cistrome. 
Thus, an integrative approach using multiple genomic tech- 
niques provides the most detailed and robust model to explain 
the generation and coordination of multiple phases of rhythm 
within a single tissue. 
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Figure 1. Circadian Transcription in Mouse 
Liver 

(A) Genome browser view of nascent transcripts at 
BmaHIArntl and Rev-erbdNr1d1 ioci at eight time 
points. GRO-seq signais on the + and - strand are 
iiiustrated in biue and red, respectiveiy. Y axis 
scaie refers to the normaiized tag count per miiiion 
reads. 

(B) Heat map of the reiative transcription of 1 ,261 
osciiiating genes sorted by osciiiation phase. 

(C) Reiative expression of pre-mRNA (green) and 
mRNA (biack) determined by GRO-seq and RT- 
qPCR, respectiveiy, throughout the day. Data are 
doubie piotted for better visuaiization. RT-qPCR 
data are expressed as the mean ± SEM (n = 3-4 
per time point) and normaiized to the maximai 
expression of the day. 

See aiso Figure SI and Tabie SI . 



RESULTS 

Circadian Transcription in Mouse Liver 

GRO-seq was performed on mouse liver nuclei collected every 
three hours throughout a 24 hr light-dark cycle. Transcription 
of known circadian genes showed robust oscillation patterns, 
exemplified by Bmal1 (ArntI) and Rev-erba (Nr1d1) (Figure 1A). 
A total of 11,288 active gene transcripts were identified, of 
which 1,261 (11%) were transcribed with oscillating patterns 
(JTK_CYCLE [Hughes et al., 2010], p < 0.01, 21 < period (t) 
< 24 hr, peak to trough ratio > 1 .5) (Figure 1 B; Table SI A avail- 
able online). Rhythmic mRNA expression of known circadian 
genes determined by RT-quantitative PCR (RT-qPCR) was asso- 
ciated with their nascent transcription (Figure 1 C), and biological 
replicates of GRO-seq samples at Zeitgeber Time (ZT) 10 and 
ZT22 showed a high degree of correlation (Pearson correlation 
coefficient, r = 0.95) (Figure SI A). In addition, genes oscillating 
in similar phases showed closely related biological functions 
(Figure SIB; Table SIB). Together, these results demonstrate 
the robustness of our data. 

De Novo Identification of Circadian Liver 
Enhancer RNAs 

Analysis of the liver GRO-seq data revealed eRNA transcription 
in both inter- and intragenic regions, exemplified by highlighted 



regions in the vicinity of Ppara and Cry2, 
respectively (Figure 2A). To globally iden- 
tify eRNA loci, we developed a pipeline 
to search for genomic locations pro- 
ducing bi- and unidirectional short RNA 
transcripts (Extended Experimental Pro- 
cedures), which identified 19,086 high 
confidence de novo eRNA loci (>300 bp 
from TSS) (Table S2A). The average 
GRO-seq signal of de novo eRNAs 
showed a bimodal profile in both inter- 
and intragenic regions (Figure 2B). Anal- 
ysis of public chromatin immunoprecipi- 
tation sequencing (ChIP-seq) data (Table 
S2B) from mouse liver suggested that de novo eRNA loci were 
enriched for other epigenomic features including H3K27ac, 
H3K4me1, DNase I hypersensitivity, and RNA polymerase II 
(Pol2) recruitment, consistent with the function of these sites 
as enhancers (Figure 2C). eRNA signals correlated with Pol2 oc- 
cupancy and histone acetylation but not histone methylation 
(Figure S2A), consistent with earlier reports (Hah et al., 2013; Li 
et al., 2013; Wang et al., 2011) and in agreement with the notion 
that H3K4me1 and H3K27ac mark enhancer identity and activity, 
respectively (Creyghton et al., 2010). 

To examine dynamics of eRNA transcription across the 24 hr 
cycle, eRNA transcripts were quantified using GRO-seq tag 
counts within ±500 bp from the centers of eRNA loci. Remark- 
ably, 5,724 (30%) of eRNAs were found to be transcribed in a 
circadian manner (JTK_CYCLE, p < 0.05, 21 < period (t) < 
24 hr, peak to trough ratio > 1 .5) (Table S2C), and their relative 
expression peaked at different times of the day (Figure 2D). 
Based on their peak expression time (hereafter referred to as 
“phase”), circadian eRNAs were divided into eight groups 
(phase ZT0-ZT24, at 3 hr intervals), represented by eight colors 
in Figure 2D. Interestingly, circadian eRNAs were not evenly 
distributed across the eight phase groups. A total of 71% of 
circadian eRNAs oscillated with a phase between ZT18 and 
ZT3, whereas 29% of circadian eRNAs oscillated in other phases 
(Figure 2E; Table S2C). Examples of circadian eRNAs with phase 
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Figure 2. De Novo Identification of Circa- 
dian Liver Enhancer RNAs 

(A) Genome browser view of intergenic (upper 
panei) and intragenic (iower panei) eRNAs (yeiiow 
boxes). 

(B) GRO-seq tag densities in 4 kb windows sur- 
rounding de novo intergenic (upper panei) and 
intragenic (iower panei) eRNA ioci are shown for 
the pius (biue) and minus (red) strand. Y axis 
shows average reads per 10 miiiion reads (RPTM) 
per 10 bp bin. 

(C) Average ChiP-seq tag densities of epigenetic 
marks in 2 kb window surrounding aii de novo 
eRNA ioci (prior to the seiection of high confidence 
eRNAs) and matched controi regions. 

(D) Heat map of the reiative transcription of oscii- 
iating eRNAs throughout the day. Coior coding of 
eRNA popuiation in eight phase groups (from ZTO 
to ZT24, at 3 hr intervais) is shown on the right. 

(E) Rose diagram showing the prevaience of eRNA 
ioci in each phase group. For each wedge, the 
coior corresponds to that in (D) and the area is 
proportionai to the number of eRNAs in that group. 

(F) Genome browser view of osciiiating eRNAs at 
Cry1 iocus. 

(G) RT-qPCR vaiidation of circadian transcription 
for intergenic, intragenic, and noncyciic eRNAs at 
indicated gene ioci. Data are expressed as mean ± 
SEM (n = 3-4 per time point) and normaiized to the 
first time point. 

See aiso Figure S2 and Tabie S2. 



ZT22 at the Cry1 locus are shown in Figure 2F. eRNA transcripts 
oscillating in different phases were confirmed by RT-qPCR (Fig- 
ure 2G) at selected intergenic and intragenic eRNA loci (Fig- 
ure S2B). The unbalanced phase distribution of eRNAs agrees 
with the previous finding that histone acetylation, a reflection 
of enhancer activity, was globally high around ZT22 and low 
around ZT10 in the mouse liver (Feng et al., 2011). Moreover, 
the average H3K27ac level at eight groups of eRNA loci showed 
the same oscillatory pattern as the circadian eRNAs within each 
group (Figure S2C). Therefore, circadian eRNAs oscillate in 
diverse phases, suggesting that circadian enhancer activities 
are orchestrated by distinct mechanisms in liver. 

Phase-Specific Transcription Factors at Circadian 
Enhancers 

We have shown that gene body and eRNA transcription occur in 
multiple phases. As previous studies suggested correlated tran- 
scription of eRNA and nearby target genes (Core et al., 2008; 
Hah et al., 2013; Kim et al., 2010), we examined whether eRNA 
oscillations are related to circadian gene transcription. The 
expression of genes mapped closest to oscillating eRNAs (within 
200 kb from TSS) showed rhythmic patterns in phase with eRNA 
expression (Figure 3A). Among all genes mapped to circadian 
eRNAs, 423 (34%) circadian gene transcripts were mapped to 
1,124 (20%) circadian enhancers and oscillation phases be- 
tween each enhancer-gene pair were highly correlated (r = 0.9) 



(Figure S3A). This is likely an underestimate based on the strin- 
gent eRNA-gene mapping criteria and, indeed, if the analysis is 
not limited to the nearest gene, up to 76% of circadian genes 
in different phases have in-phase eRNAs (phase difference 
<3 hr between gene and eRNA) located within 200 kb of their 
TSSs. By contrast, for random genes this number is ~10% on 
average (hypergeometric test, p < 0.001) (Figure S3B). Together, 
these results suggest that circadian eRNAs predict rhythmic 
transcription of nearby genes and are likely to be functionally 
associated with circadian genes of the same phase. 

Although gene body and eRNA transcription occur in multiple 
phases, the core clock oscillator in liver has only one peak and 
one trough in a 24 hr period (Koike et al., 2012). We considered 
the possibility that specific circadian TFs were responsible for 
the different phases of gene expression by driving the transcrip- 
tion of diversely phased eRNAs. To this end, we performed 
motif analysis on the eight groups of circadian enhancers using 
500 bp windows centered on each eRNA locus (Figure S3C). 
First, candidate phase-specific TFs with the most enriched mo- 
tifs in each enhancer group were selected by de novo motif min- 
ing (Table S3). Then, annotated motifs of candidate TFs were 
used to quantify the motif enrichment in each enhancer group, 
revealing four major types of motifs specifically enriched in six 
enhancer groups (Figure 3B). Specifically, an E-box motif was 
the most enriched at circadian eRNA loci in phase ZT6-ZT9, 
coincident with the peak of BMAL1 binding to the genome 
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Figure 3. Phase-Specific Transcription Factors at Circadian Enhancers 

(A) Relative transcription of genes closest to oscillating eRNAs (within 200 kb of TSS). 

(B) Motifs specifically enriched in each eRNA group are labeled in the clock diagram on the left. Position weight matrix (PWM) of each motif and its best enrichment 
p value in assigned groups are shown in the table on the right. 

(C) Correlation of motif occurrence and TF binding in eight eRNA phase groups. In each plot, the red dots represent the fraction of eRNA loci bound by the 
indicated TF (top 3,000 ChIP-seq peaks), and black bars represent the fraction of eRNA loci containing the corresponding motif. Correlation coefficient r is shown 
for phase-specific motifs. TFs recognizing different types of motifs are grouped in colored boxes corresponding to those used for eRNA phases. 

See also Figure S3 and Table S3. 



(Koike et al., 2012; Rey et al., 2011; Ripperger and Schibler, 
2006). However, although BMAL1 /CLOCK has been previously 
linked to circadian gene regulation in liver, the ZT6-ZT9 eRNAs 
comprised only ~6% of circadian enhancers, consistent with an 
earlier study in which only ~5% of total circadian genes were 
transcribed in phase with nearby BMAL1 binding (Menet et al., 
2012 ). 

We also discovered that a D-box motif, recognized by PAR- 
bZIP proteins including DBP, TEF, HLF, and E4BP4 (Cowell 
et al., 1992; Li and Hunger, 2001; Mitsui et al., 2001), was the 
most enriched motif at phase ZT9-ZT15 eRNA loci (Figure 3B), 



coinciding with the phase of known target genes for these TFs 
(Gachon et al., 2006). Moreover, the RevDR2 and RORE motifs, 
bound by Rev-erba/p (Harding and Lazar, 1995) and RORa/y 
(Giguere et al., 1994), were the top motifs at eRNA loci with the 
most common phase, ZT18-ZT24 (Figure 3B), coinciding with 
the trough of repression by Rev-erba (Bugge et al., 2012; Feng 
et al., 2011). By contrast, motifs characteristic of ETS binding 
sites were highly enriched in the phase ZT0-ZT3 enhancers, 
implying a potential role of ETS proteins in the circadian regula- 
tion of transcripts with this phase (Figure 3B). In addition to these 
phase-specific motifs, constitutively enriched motifs in all 
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enhancer groups were identified, most prominently the Forkhead 
and HNF4 motifs (Figure 3B). 

We tested whether the motif enrichment in a given eRNA 
group was predictive of TF binding by overlapping each group 
of circadian eRNAs with TF cistromes determined by ChlP- 
seq. Specifically, we analyzed previously published cistrome 
data for core clock TFs (Feng et al., 2011; Koike et al., 2012) 
and performed additional ChIP-seq experiments for E4BP4 
and RORa. To minimize the effects of variable ChIP-seq quality 
in different studies, only the 3,000 strongest ChIP-seq peaks 
for each TF were used in the analysis. Notably, the genomic 
binding sites of E-box-binding factors BMAL1, CLOCK, and 
NPAS2 were enriched at eRNAs with phase ZT6-ZT9 (Figure 3C), 
where de novo analysis implicated the E-box motif. Similarly, 
genomic binding of Rev-erba and RORa was enriched at eRNAs 
whose transcription peaked at ZT21 -ZT24 (Figure 30), where the 
RevDR2 and RORE motifs were most prominent. Also consistent 
with the bioinformatic predictions, the D-box binding factor 
E4BP4 bound most commonly at eRNAs with phase ZT9-ZT15 
(Figure 30). By contrast, binding of FOXA1 and HNF4A, whose 
motifs were equally enriched in all eRNA groups, did not display 
a preference for eRNA loci of a specific phase (Figure 30). Thus, 
the regulatory activities of six TFs coincide with the rhythmic 
eRNA expression in the enhancer group at which they were en- 
riched. These data strongly suggest that TFs bound specifically 
at each enhancer group are potential drivers of their circadian 
transcription and enhancer activities. 

Phase Correlation between eRNA and Gene Body 
Transcription Marks Functional Enhancers of 
Circadian Genes 

We next considered whether the specific TFs found to bind 
at circadian enhancers were driving transcription of nearby in- 
phase genes, focusing on the most common circadian en- 
hancers (phase ZT18-ZT24). Within 200 kb of 325 circadian 
genes in phase ZT18-ZT24, 539 neighboring eRNA loci showed 
circadian eRNA transcription in phase ZT18-ZT24 (“correlated 
enhancers”), while 857 eRNA loci did not produce correlated 
eRNA transcription (“noncorrelated enhancers,” eRNA expres- 
sion ZT22/ZT10 < 1 .5) (Figure 4A). 

Correlated enhancers showed higher enrichment of the 
RevDR2 and RORE motifs in comparison to noncorrelated en- 
hancers (Figure 4B). Notably, relative enrichment of the RevDR2 
motif, which is a preferential binding site for Rev-erba (Harding 
and Lazar, 1995; Zhao et al., 1998) was 2-fold higher than that 
of the RORE motif shared by Rev-erba and RORa (Giguere 
et al., 1994), suggesting that Rev-erba may play a more impor- 
tant role in regulating the correlated enhancers. ChIP-seq tag 
densities of Rev-erba and its corepressor HDAC3 were dramat- 
ically stronger at correlated enhancers than at noncorrelated 
enhancers (Figure 40), supporting the idea that the correlated 
enhancers in phase ZT18-ZT24 were controlled by Rev-erba. 
To test this hypothesis, GRO-seq was performed on livers from 
mice genetically lacking Rev-erba (Rev-erba~^~) at ZT10, when 
Rev-erba levels normally peak and maximally repress histone 
acetylation and gene transcription (Feng et al., 2011). Indeed, 
eRNA signals at the correlated enhancers were markedly dere- 
pressed in Rev-erba~^~ mice, while no such change was seen 



at the noncorrelated enhancers (Figure 4D). Similar results 
were obtained at both inter- and intragenic enhancers (Fig- 
ure S4). Importantly, gene body transcription that normally 
peaked at ZT18-ZT24 was also extensively derepressed in 
Rev-erba~^~ mice at ZT10 (Figure 4E), indicating these genes 
are direct targets of Rev-erba. Together, these results demon- 
strate that eRNAs in phase ZT18-ZT24 mark functional Rev- 
erba binding sites that regulate neighboring target genes with 
correlated phase. Conversely, noncorrelated enhancers are not 
bound by Rev-erba and do not control Rev-erba target genes. 

Circadian eRNAs Reveal the Functional Rev-erba 
Cistrome at Oscillating Genes 

The findings to this point demonstrate that Rev-erba regulates 
circadian genes in phase ZT18-ZT24 via enhancers oscillating 
in phase with gene body transcription. However, these en- 
hancers account for only a small fraction of the complete Rev- 
erba cistrome (Feng et al., 2011). We therefore considered 
whether circadian eRNAs in phase ZT18-ZT24 uniquely mark 
the functional subset of Rev-erba binding sites controlling circa- 
dian genes in liver. To test this, Rev-erba sites near circadian 
genes were divided into three groups, of which 887 (33%) over- 
lapped de novo eRNA loci, 347 (13%) were found at TSSs of 
circadian genes (within 300 bp), and the remaining 1 ,455 (54%) 
were not associated with detectable eRNA transcription (Fig- 
ure 5A). Of the eRNAs transcribed at Rev-erba binding sites, 
30% peaked at ZT18-ZT24, while 19% peaked in other phases, 
and 51 % were constitutively expressed eRNA and did not oscil- 
late (Figure 5A). 

Rev-erba and its corepressor HDAC3 bound more strongly at 
sites producing ZT 1 8-ZT24 eRNAs than at other types of binding 
sites (Figure 5B), resulting in a marked decrease in histone H3K9 
acetylation from ZT22 to ZT10 (Figure S5). To directly assess the 
functionality of Rev-erba binding on individual gene expression, 
we constructed a list of high confidence target genes whose 
nascent and mature transcripts were derepressed in Rev-erba~^~ 
livers at ZT1 0 compared to wild-type (WT) (Tables S4A-S4C). The 
enrichment of derepressed circadian genes in Rev-erba~^~ mice 
was >3-fold higher near Rev-erba sites producing ZT18-ZT24 
eRNAs, compared to other Rev-erba sites (Figure 5C), suggest- 
ing that ZT18-ZT24 eRNAs mark functional Rev-erba sites. 
Moreover, circadian genes with phase around ZT21-ZT24 were 
highly enriched for derepression in Rev-erba~^~ mice (Figure 5D), 
consistent with the enrichment of circadian eRNAs in this phase. 
Together, these data strongly suggest that only a subset of the 
Rev-erba cistrome associated with antiphase eRNAs is func- 
tional in controlling circadian gene transcription. 

eRNA Analysis Identifies E4BP4 as a Key Mediator of 
Gene Activation by Rev-erba 

While eRNAs clearly delineate the functional Rev-erba cistrome 
responsible for direct transcriptional repression, there remains a 
substantial set of genes paradoxically downregulated at ZT10 in 
Rev-erba~^~ mouse livers, which cannot be explained through 
direct regulation by Rev-erba. To identify factors mediating this 
opposing effect on gene transcription, we constructed a list of 
high confidence target genes whose nascent and mature tran- 
script levels were decreased in Rev-erba~^~ livers at ZT10 
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Figure 4. Phase Correlation between eRNA and Gene Body Transcription Marks Functional Enhancers of Circadian Genes 

(A) Heatmap of the relative transcription of 325 circacfian genes in phase ZT1 8-ZT24 (left) ancf their neighboring eRNAs (right). A total of 539 eRNAs in correlatecf 
phase are shown in the reef box while 857 noncorrelatecf eRNAs are in the blue box. 

(B) Enrichment of RevDR2 ancf RORE motif in correlatecf eRNA loci relative to noncorrelatecf eRNA loci (hypergeometric test, ***p < 0.001). 

(C) ChIP-seq tag density of Rev-erba (left) and HDAC3 (right) in 2 kb windows surrounding correlated (red) and noncorrelated eRNA loci (blue). Y axis shows the 
average tag count per 10 bp bin normalized to 10 million total reads. 

(D) Comparison of GRO-seq tag density (RPTM per 10 bp bin in 2 kb window) surrounding correlated (left) and noncorrelated (right) eRNA loci in WT and Rev- 
erbar'~ livers at ZT10. 

(E) Heatmap of transcriptional changes between WT and Rev-erboT' livers at ZT 1 0, for the 325 circadian genes in phase ZT 1 8-ZT24 (right column), compared to 
the same number of random genes (left column). Data are expressed as log2 fold change. 

See also Figure S4. 
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Figure 5. Circadian eRNAs Reveal the 
Function of the Rev-erba Cistrome at Oscil- 
lating Genes 

(A) Distribution of Rev-erba ChiP-seq peaks near 
circadian genes (upper panei) and subdistribution 
of eRNA-producing Rev-erba peaks near circa- 
dian genes (iower panei). 

(B) Boxpiot showing Rev-erba and HDAC3 peak 
height at binding sites from (A). Y axis indicates 
normaiized tag count in each peak (RPTM) (***p < 
0.001 , one-way ANOVA and Tukey’s test). 

(C) Enrichment of derepressed genes in Rev- 
erbor'~ mice at circadian genes bound by 
different Rev-erba peaks from (A) reiative to a 
random set of Rev-erba peaks (hypergeometric 
test, ***p < 0.001). 

(D) Enrichment of derepressed genes in Rev- 
erbar'~ mice in 8 groups of circadian genes with 
indicated phases reiative to randomiy seiected 
genes (hypergeometric test, ***p < 0.001). 

See aiso Figure S5 and Tabie S4. 



(Tables S4A-S4C). Profiling of eRNAs near genes that were 
down regulated in the Rev-erboL~^~ livers revealed a marked 
and specific enrichment for phases between ZT9 and ZT1 5 (Fig- 
ure 6A), which were shown earlier to be enriched for the D-box 
motif and binding of the D-box repressor E4BP4 (Figure 3C). 

We hypothesized that, by controlling the circadian expression 
of E4BP4, Rev-erba indirectly dictated the circadian expression 
of a large set of genes controlled by D-box enhancers whose 
expression would thus be in phase with Rev-erba. Indeed, 
E4BP4 gene expression was circadian in WT mouse livers but 
constitutively elevated in Rev-erboL~'~ mice (Figure 6B), consis- 
tent with a previous report (Duez et al., 2008). Furthermore, 
Rev-erba bound along with its NCoR-HDAC3 corepressor com- 
plex to several sites at the E4BP4 (Nfil3) locus, suggesting that 
E4BP4 expression is directly controlled by Rev-erba (Fig- 
ure S6A). By contrast, there were weaker changes in hepatic 
expression of D-box activating factors Dbp, Tef, and Hlf \n livers 
of Rev-erba~^~ mice, and the expression of these factors re- 
mained circadian with similar phases (Figure SOB). 



To identify putative functional E4BP4 
sites, we analyzed the complete set of 
E4BP4 ChIP-seq peaks for those with 
higher eRNA levels at ZT9-ZT15 (ZT10/ 
ZT22 > 3 or ZT13/ZT1 > 3). These sites, 
which we refer to as “E4BP4+eRNA” 
sites, were enriched 2-fold around genes 
downregulated in Rev-erba~^~ mice (Fig- 
ure 6C), demonstrating a significant as- 
sociation between E4BP4 binding and 
gene regulation downstream of Rev- 
erba. Transcriptome profiles from livers 
of WT mice (Hughes et al., 2009) con- 
firmed that putative E4BP4 target genes 
(downregulated in Rev-erba~^~ livers 
and near E4BP4+eRNA sites) were 
generally circadian with average peak 
and trough expression in phase with 
Rev-erba and E4BP4 levels, respectively (Figure 6D, green 
line). The average GRO-seq transcription profile for this same 
group of genes showed a similar pattern over a 24 hr cycle 
(Figure 6D, blue line). Both patterns are consistent with direct 
repression by E4BP4 leading to circadian oscillation in phase 
with Rev-erba protein levels. In contrast, Rev-erba target genes 
(upregulated in Rev-erbcT^~ livers and near Rev-erba sites 
overlapping ZT18-ZT24 eRNAs) were on average antiphase to 
Rev-erba expression in WT livers, consistent with direct tran- 
scriptional repression by Rev-erba (Figure 6E). As a control, 
genes that were expressed near oscillating eRNAs, but un- 
changed in the Rev-erba~^~ livers, were not systematically 
phased relative to Rev-erba or E4BP4 levels (Figure 6F). 

These findings support a model in which Rev-erba indirectly 
activates genes in phase ZT9-ZT15 by repressing the D-box 
repressor E4BP4. Such a model predicts that E4BP4 target 
genes would be constitutively downregulated in Rev-erbar'~ 
livers, with increased E4BP4 binding at nearby functional sites. 
Indeed, expression profiling over a 24 hr cycle revealed that 
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Figure 6. E4BP4 Functions Downstream of Rev-erba 

(A) Enrichment of oscillating eRNAs in each phase group near genes downregulated in Rev-erboT^^ livers relative to control genes. Significantly enriched phases 
are noted as corresponding to D-box/E4BP4-enriched phase group, (hypergeometric test, *p < 0.05). 

(B) mRNA expression of E4BP4/Nfil3 in WT and Rev-erboi~^~ livers measured by RT-qPCR throughout the day. Data are expressed as mean ± SEM (n = 2 per time 
point and genotype) normalized to the first WT time point. 

(C) Enrichment of E4BP4+eRNA bound genes among those downregulated (green) or upregulated (red) in Rev-erboL~^~ livers relative to unchanged genes (gray) 
(hypergeometric test, *p < 0.05). 

(D-F) Average circadian expression profiles in WT mouse livers (Hughes et al., 2009) and corresponding transcription profiles by GRO-seq for (D) genes 
downregulated in Rev-erba^^ livers within 200 kb of E4BP4 binding at ZT9-ZT1 5 circadian eRNAs, (E) genes upregulated in Rev-erba~^^ livers within 200 kb of 
Rev-erba binding at ZT18-ZT24 circadian eRNAs, and (F) nonregulated control genes (expressed in liver within 200 kb of circadian eRNA in any phase). 

(G) ChIP-qPCR of E4BP4 binding at genes downregulated in Rev-erboi~^~ livers at ZT10. Binding is shown at ZT10 (solid bars) and ZT22 (hashed bars) in WT (blue) 
and Rev-erba^^^ (orange) livers. Data are expressed as mean ± SEM (one-way ANOVA, *p < 0.05, *p < 0.1 , n = 3-4 per group). 

(H) mRNA expression measured by RT-qPCR in liver overexpressing Rev-erba (mice injected with AAV-Tbg-Rev-erba) or control liver (mice injected with 
AAV-Tbg-GFP) at ZT22. Data are expressed as mean ± SEM (one-way ANOVA, *p < 0.05, ^p < 0.1 ; n = 6 per group). 

(I) ChIP-qPCR of E4BP4 binding at same sites as (G) in liver overexpressing Rev-erba (orange) or control liver (blue). Data are expressed as mean ± SEM (one-way 
ANOVA, *p < 0.05, ^p < 0.1 , n = 5-6 per group). 

See also Figure S6. 
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genes near E4BP4+eRNA sites showed attenuated rhythmic 
expression in Rev-erboL~^~ livers (Figure S6C). Furthermore, 
E4BP4 genomic binding was increased at ZT10 and no longer 
circadian at these sites in Rev-erba~^~ livers (Figure 6G). 

We also tested the effect of ectopic expression of Rev-erba in 
mouse livers on E4BP4 expression and function. Interrogation of 
data from a previously published experiment (Kornmann et al., 
2007) revealed upregulation of the genes putatively controlled 
by E4BP4 in livers constitutively expressing Rev-erba, particu- 
larly at the physiological peak time of E4BP4 expression (Fig- 
ure SOD). Indeed, while constitutive expression of Rev-erba in 
mouse liver repressed its direct targets such as Small and 
E4BP4INf\l3, it upregulated E4BP4 target genes at ZT22 (Fig- 
ure 6H). This effect was much less apparent at ZT10 when 
E4BP4 is already at physiologically low levels (Figure S6E). 
Importantly, E4BP4 binding at putative functional sites near 
these genes was reduced at ZT22, consistent with loss of repres- 
sion by E4BP4 at the implicated D-box elements (Figure 6I). 
These results strongly suggest that E4BP4 functions down- 
stream of Rev-erba, via sites transcribing eRNA in phase ZT9- 
ZT15, to repress the genes that are downregulated in Rev- 
erba~^~ livers and upregulated when Rev-erba is overexpressed. 

Circadian eRNAs Define Functional Cistromes that 
Distinguish CLOCK and Rev-erba Target Genes 

CLOCK and Rev-erba have opposite effects on gene transcrip- 
tion; however, their maximal binding to the genome occur in 
roughly the same time window (ZT8-ZT10) (Cho et al., 2012; 
Feng et al., 2011; Koike et al., 2012). ChIP-seq results suggest 
that 80% of genes bound by CLOCK within 200 kb of TSS 
were also bound by Rev-erba (Figure S7A), resulting in 15%- 
35% of circadian genes in different phases cobound by these 
two factors (Figure S7B). The question as to how co-occurrence 
of CLOCK and Rev-erba binding affects rhythmic gene tran- 
scription remains unsolved (Zhao et al., 2014). 

Having demonstrated that functional Rev-erba sites marked 
by ZT 1 8-ZT24 eRNAs correlated with target gene phase (Figures 
4 and 5), we tested whether eRNAs oscillating in other phases 
could identify the functional cistromes of other clock compo- 
nents. To this end, we analyzed published microarray data 
measuring gene expression in livers of WT and Clock mutant 
mice (Miller et al., 2007). We first noted that genes downregu- 
lated in the Clock mutant mice were significantly enriched for 
circadian eRNAs in the phase ZT6-ZT9 compared to control 
genes (Figure S7C), corresponding to the enrichment of E-box 
motif and CLOCK binding. We then selected putatively func- 
tional CLOCK sites (Koike et al., 2012) producing eRNAs in 
phase with CLOCK binding (Table S5, eRNA level ZT7/ZT19 > 
3 or ZT10/ZT22 > 3) and correlated with nearby gene transcrip- 
tion and compared these sites to the remainder of the CLOCK 
cistrome. 

Target genes within 200 kb of putatively functional CLOCK 
sites showed rhythmic mRNA expression in WT mice (Miller 
et al., 2007), peaking at the time point corresponding to ZT10 
in our studies (Figure 7A, yellow line). These genes also showed 
reduced expression overall in Clock mutant mice, particularly at 
time points corresponding to ZT6 and ZT10 (Figure 7A, orange 
line). By comparison, genes near other CLOCK sites showed 



weaker average rhythm and weaker average reduction in Clock 
mutant mice (Figure 7B). Further confirming that CLOCK sites 
marked by in phase eRNA represent the functional subset of 
the CLOCK cistrome, target genes near these sites are signifi- 
cantly enriched for circadian genes specifically in phases ZT6- 
ZT12, but not opposing phases (Figure 7C) and are also signi- 
ficantly enriched for genes downregulated >1 .5-fold in Clock 
mutants (Figure 7D). The fact that mRNA levels of some CLOCK 
target genes cycle in phases ZT9-ZT12 is likely due to delays in 
the phase of mature mRNA oscillations relative to nascent tran- 
scription, as noted in previous studies (Menet et al., 201 2). Taken 
together, these results demonstrate that CLOCK sites marked by 
in phase eRNAs represent the functional component of the total 
cistrome. 

To examine whether CLOCK and Rev-erba are both functional 
at cobound circadian genes, functional binding sites of each 
factor were mapped to their closest circadian genes. CLOCK 
binding sites at TSS were included in this analysis as they are 
also enriched at genes downregulated in Clock mutant mouse 
livers (Figure S7D), consistent with previous studies (Rey et al., 
2011). Remarkably, the majority of cobound circadian genes 
contained functional binding sites of only one factor but not 
both, with genes around phase ZT6-ZT9 and ZT18-ZT24 most 
enriched for functional CLOCK and Rev-erba sites, respectively 
(Figure 7E). These findings suggest exclusive functions of either 
CLOCK or Rev-erba at most cobound genes. Consistent with 
this notion, expression profiling showed that cobound genes 
exclusively carrying functional CLOCK sites, such as Nr1d1, 
Nr1d2, and Tef, are deactivated in Clock mutant mice, while 
those only carrying functional Rev-erba sites, such as Cry1 
and E4BP4, are derepressed in Rev-erba~^~ mice (Figures 7F 
and S7E). Therefore, despite frequent colocalization of their 
binding, CLOCK and Rev-erba control distinct sets of circadian 
genes that can be predicted from their regulation of eRNAs. 

DISCUSSION 

Unbiased analysis of the nascent transcription of over 5,000 
circadian eRNAs and the TF motifs at these sites has allowed 
us to identify the direct genomic targets of multiple circadian reg- 
ulators in mouse liver. Circadian eRNA loci are enriched for 
enhancer marks, the phase of eRNA oscillation correlated with 
that of nearby genes, and knockout studies demonstrated the 
causal relationship between TF binding and the transcriptional 
regulation at enhancers and the genes they control. These re- 
sults informed the comparison of cistromes with gene expres- 
sion and thus revealed the functional cistromes of multiple TFs 
that bind at thousands of genomic sites in liver. 

Previous genomic studies of circadian gene regulation have 
focused primarily on the core clock components BMAL1/ 
CLOCK, which bind DNA with a uniform genome-wide phase 
peaking at ZT6-ZT9 (Hatanaka et al., 2010; Koike et al., 2012; 
Menet et al., 2012; Rey et al., 2011; Yoshitane et al., 2014), yet 
only a small fraction of circadian gene transcription is in this 
phase. Our data suggest that only the genes with phase ZT6- 
ZT9 are the true BMAL1 /CLOCK targets, while many other genes 
are bound, but not controlled, by BMAL1 /CLOCK possibly due 
to inactive binding or long distance looping to different genes. 
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Figure 7. Circadian eRNAs Define Func- 
tional Cistromes that Distinguish CLOCK 
and Rev-erba Target Genes 

(A) Average expression of genes within 200 kb of 
CLOCK binding sites producing eRNA in phase 
with CLOCK binding and target gene expression 
in WT (yeiiow iine) and Clock mutant (orange iine) 
mouse iivers from Miiier et ai. (2007) (Wiicoxon 
test of gene foid-change distribution versus 
matching time points in (B), *p < 0.05). 

(B) Average expression of genes within 200 kb of 
CLOCK binding sites iacking in-phase eRNA in 
WT (dark gray iine) and Clock mutant (iight gray 
iine) mouse iivers from Miiier et ai. (2007). 

(C) Enrichment of circadian genes expressed in 
phase with CLOCK binding (ZT6-ZT12) or anti- 
phase to CLOCK binding (ZT18-ZT24) for the 
gene groups used in (A) (yeiiow) and (B) (gray) 
reiative to random genes (hypergeometric test, 
*p < 0.05). 

(D) Enrichment of genes downreguiated in Clock 
mutant iivers among the gene groups used in 
(A-C) reiative to random genes (hypergeometric 
test, *p < 0.05). 

(E) Fraction of osciiiating genes cobound by 
CLOCK and Rev-erba that are within 200 kb of TF 
binding sites producing rhythmic eRNA in phase 
with CLOCK activation (biue), Rev-erba repres- 
sion (red), or both (green). Osciiiating genes are 
divided according to their phases. Representa- 
tive genes are noted in each group. 

(F) Enrichment of CLOCK and Rev-erba reguiated 
genes (expression foid change in mutant >95% of 
random genes) in those with eRNA predicted 
functionai binding sites in (E), reiative to random 
genes (hypergeometric test, ***p < 0.001, not 
significant [N.S.] p > 0.05). 

See aiso Figure S7 and Tabie S5. 



Moreover, despite extensive binding region overlap with Rev- 
erba (Cho et al., 2012), whose repressive activity would conflict 
with activation by BMAL1 /CLOCK, our results demonstrate on 
a genome-wide scale that enhancer activity is primarily con- 
trolled by one factor or the other. 

Importantly, our unbiased identification of enhancers revealed 
not only the ZT6-ZT9 enhancers marked by E-box motifs and 
bound by BMAL1 , NPAS2, and CLOCK but also more abundant 
sets of enhancers in other phases. Those peaking at ZT0-ZT3, 
ZT9-ZT15, and ZT18-ZT24 were enriched for ETS, D-box, and 
RevDR2/RORE motifs, respectively. The ETS motif is recognized 
by a large family of TFs (Hollenhorst et al., 2011), some of which 
have recently been implicated in circadian biology and will be 
the focus of future research (Anafi et al., 2014; Ciarleglio et al., 
2014). Moreover, by integration of enhancer sites with cistromic 
data, E4BP4 emerged as a key regulator of the ZT9-ZT1 5 D-box 
enhancers in normal liver, as well in the Rev-erboL~^~ livers, and 
Rev-erba was clearly a strong antiphase repressor bound to 
RevDR2/RORE sites at ZT18-ZT24 enhancers. 



Interestingly, the phase of circadian en- 
hancers exhibited an uneven distribution, 
with 42% of circadian eRNAs peaking 
during the late night (ZT18-ZT24), while rhythmic gene transcrip- 
tion was more evenly distributed across all phases. A possible 
explanation is that the regulation of genes whose transcription 
peaks in the light cycle might be primarily regulated at promoters. 
For example, BMAL1 controls gene transcription at both pro- 
moters and enhancers (Rey et al., 2011), whereas Rev-erba, the 
main controller of the ZT18-24 phase, binds mainly intergenically 
(Feng et al., 2011; Lam et al., 2013). The overabundance of en- 
hancers in phase ZT1 8-ZT24 is surprising, yet remarkably consis- 
tent with the previously unexplained finding of Koike et al. (2012) 
that the global peak of initiated Pol2 occurs at ~ZT22-ZT24. 

Analysis of oscillating eRNAs in mice fed normal chow ad libi- 
tum did not reveal the motifs for TFs previously suggested to 
entrain liver circadian gene expression to feeding/fasting cycles, 
such as CREB, SREBP, PPARs, and FOX01 (Adamovich et al., 
2014; Eckel-Mahan et al., 2013; Vollmers et al., 2009). Some of 
these TFs, such as CREB and SREBP, bind preferentially to pro- 
moters of target genes (Everett et al., 2013; Gilardi et al., 2014; 
Seo et al., 2009), which would not be captured by analysis of 
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eRNAs. Phase-specific enrichment could also have been 
masked by motifs bound by constitutive liver TFs, such as 
HNF4A and FOXA1, that bind at enhancers in all phases. It will 
be interesting to profile eRNAs under altered dietary conditions 
in future studies to examine the interplay between metabolic 
cues and circadian rhythms at enhancers. 

Rev-erba expression and repressive function peaks at ZT1 0 in 
liver, thereby orchestrating circadian transcription in the 
opposing phase (ZT22) (Feng et al., 2011). Consistent with this, 
recruitment of Rev-erba and its corepressor was strongest at 
sites of ZT18-ZT24 eRNA transcription. It should be noted that 
the entire Rev-erba cistrome in liver includes thousands of other 
binding sites, with <10% characterized by rhythmic eRNAs anti- 
phase to Rev-erba binding. Deletion of Rev-erba specifically 
activated transcription of these eRNAs, as well as the genes 
they control, thus clearly delineating the functional component 
of the Rev-erba cistrome. 

In addition to the direct regulation of circadian genes antiphase 
to Rev-erba expression, we uncovered a large set of in-phase 
circadian transcripts that were downregulated in the absence 
of Rev-erba, contrary to its powerful repressive function. Func- 
tional enhancer analysis suggested that the downregulated 
genes in Rev-erbcT^~ mice were mediated by D-box factors, 
including E4BP4, a direct target of Rev-erba. While the direct 
regulation of E4BP4 by Rev-erba has been recognized (Duez 
et al., 2008), a relatively small number of E4BP4 target genes 
have been identified in liver, based primarily on in vitro studies 
of proximal promoter constructs (Tong et al., 2010; Ueda et al., 
2005). Our study includes a ChIP-seq study of E4BP4 in liver, 
and our integrative analysis demonstrates the extensive, 
genome-wide effects of this pathway, revealing how a single 
TF, such as Rev-erba, can regulate opposing phases of circadian 
gene expression by its direct and indirect actions. 

Together, the present studies reveal mechanisms for gener- 
ating and coordinating multiple phases of circadian transcription 
in a single organ. They also demonstrate that the unbiased anal- 
ysis of enhancer activity and correlated gene expression is a 
powerful method of discovering relevant TFs and their specific 
functional cistromes, which can be more generally applied to un- 
derstanding the transcriptional regulation of physiology and dis- 
ease states. 

EXPERIMENTAL PROCEDURES 
Mice 

WT C57BI/6 mice were purchased from the Jackson Laboratories. The Rev- 
erbar'~ mice were obtained from B. Vennstrom and backcrossed greater 
than or equai to seven generations with C57Bi/6 mice. WT and mutant maie 
mice (1 0- to 1 2-week-oid) were housed under standard 1 2 hr iight/1 2 hr dark cy- 
cies, with iights on (ZTO) at 7 a.m. and iights off (ZT12) at 7 p.m. and euthanized 
at indicated times. Ali animai care and procedures foiiowed the guideiines of the 
Institutionai Animai Care and Use Committee of the University of Pennsyivania. 

Antibodies 

E4BP4 antibodies (Santa Cruz sc-9550 and sc-9549) were mixed in 1:1 ratio 
for Chip. RORa antibody was purchased from Santa Cruz (sc-6062). 

GRO-Seq 

The GRO-seq was performed as previously described (Core et al., 2008; 
Step et al., 2014; Wang et al., 2011). Raw data are available in Gene Expres- 



sion Omnibus (GEO) (GSE59486). See also Extended Experimental 
Procedures. 

De Novo Identification of eRNAs 

A pipeline was constructed for genome-wide de novo identification of eRNA 
loci. See also Extended Experimental Procedures. 

Analysis of Osciliating Gene Transcripts and eRNAs 

RPKTM values across all time points for each transcript and eRNA feature 
were analyzed for significant circadian oscillations using JTK_CYCLE (Hughes 
et al., 2010). Motif mining at oscillating eRNAs was performed by applying 
HOMER to the 500bp window centered on each locus. See also Extended 
Experimental Procedures. 

Gene and eRNA Expression Analysis 

Total RNA was extracted from liver using the RNeasy Mini Kit (QIAGEN) 
and treated with DNase (QIAGEN). RNA was reverse transcribed using 
the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). 
Quantitative PCR was performed with Power SYBR Green PCR Mastermix 
on the PRISM 7500 (Applied Biosystems) and analyzed by the standard 
curve method. Gene or eRNA expression was normalized to mRNA levels 
of housekeeping gene 36B4 {Arbp). Primer sequences can be found in 
Table SID. 

Microarray Anaiysis 

Microarray analysis of WT and Rev-erba~^~ livers (n = 5) was performed by the 
Penn Microarray Core. Raw data are available from GEO (GSE59460). See also 
Extended Experimental Procedures. 

Chip 

ChIP-qPCR and ChIP-seq experiments were performed as described (Feng 
et al., 2011) with minor changes. Raw data for RORa and E4BP4 ChIP-seq 
are available in GEO (GSE59486). See also Extended Experimental 
Procedures. 

ChIP-Seq Data Analysis 

Sequenced reads were aligned to the mouse reference genome (mm9) and 
peak calling was performed with HOMER (Heinz et al., 201 0). Sources of public 
ChIP-seq data analyzed are listed in Table S2B. See also Extended Experi- 
mental Procedures. 

Liver-Specific Gene Expression 

Flag-Rev-erba and GFP cDNAs were subcloned into hepatocyte-specific AAV 
vector AAV8-Tbg (Bell et al., 2011) and tail veins were injected with 1 x 10^^ 
genome copies per mouse. Livers were harvested 2 weeks after injection. 

ACCESSION NUMBERS 

The GEO accession number for the GRO-seq and ChIP-seq data reported in 
this paper is GSE59486. The GEO accession number for the microarray data 
reported in this paper is GSE59460. 
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SUMMARY 

The endoribonuclease Dicer is known for its central 
role in the biogenesis of eukaryotic small RNAs/mi- 
croRNAs. Despite its importance, Dicer target tran- 
scripts have not been directly mapped. Here, we 
apply biochemical methods to human cells and 
C. elegans and identify thousands of Dicer-binding 
sites. We find known and hundreds of additional 
miRNAs with high sensitivity and specificity. We 
also report structural RNAs, promoter RNAs, and 
mitochondrial transcripts as Dicer targets. Interest- 
ingly, most Dicer-binding sites reside on mRNAs/ 
IncRNAs and are not significantly processed into 
small RNAs. These passive sites typically harbor 
small. Dicer-bound hairpins within intact transcripts 
and generally stabilize target expression. We show 
that passive sites can sequester Dicer and reduce 
microRNA expression. mRNAs with passive sites 
were in human and worm significantly associated 
with processing-body/granule function. Together, 
we provide the first transcriptome-wide map of Dicer 
targets and suggest conserved binding modes and 
functions outside of the miRNA pathway. 

INTRODUCTION 

Genes are subject to posttranscriptional regulation by small RNAs 
(sRNA) and RNA-binding proteins (RBPs). Over the past years, 
many regulatory sRNAs have been discovered (Bartel, 2009; 
Kim et al., 2009). Most miRNAs are generated from primary tran- 
scripts that undergo two distinct steps of processing. First, 
DROSHA and its partner DGCR8 release ~70 nt stem-loop 
precursor miRNAs (pre-miRNAs) inside of the nucleus (Lee et al., 
2003). Alternatively, “mirtrons” are derived from introns in a 
splicing-machinery-dependent and Drosha-independent fashion 
(Okamura et al., 2007; Ruby et al., 2007). The pre-miRNA hairpins 
are further cut by Dicer (Hutvagner et al., 2001). The physical dis- 
tance between the Dicer PAZ and 5' pocket and the RNase III do- 
mains functions as a molecular “ruler” to control the product size 



(Lau et al., 2012; Park et al., 2011). The resulting ~22 nt miRNA/ 
miRNA* duplexes have a 2 nt overhang at the 3' ends, which is a 
characteristic signature of Dicer processing (Bartel, 2009). The 
duplexes are subsequently handed over to Argonaute (AGO) pro- 
teins, and one of the two strands is selected as the mature miRNA 
to form active RNA-induced silencing complex (RISC), which 
represses the expression of target genes (Bartel, 2009). 

Dicer binding sites are typically indirectly inferred by mapping 
sRNA sequencing data to transcripts. Two adjacent read stacks 
suggest a pre-miRNA that was cleaved by Dicer. A more variable 
3' than 5' end is also typical for miRNAs (Chiang et al., 2010). By 
computationally scoring these and other hallmarks of Dicer pro- 
cessing, Dicer targets are inferred (Friedlander et al., 2012). 
These methods invariably make assumptions about how Dicer 
binds and processes substrates. 

sRNA sequencing revealed diverse sRNA classes, including 
miRNAs that are processed from tRNAs and snoRNAs (Castellano 
and Stebbing, 2013; Ender et al., 2008; Friedlander et al., 2014). 
Currently, more than 1,800 pre-miRNAs in human and 200 in 
C. elegans are listed in miRBase (Kozomara and Griffiths-Jones, 
2014). However, it remains challenging to distinguish miRNAs 
from fragments of other transcripts based on sRNA-sequencing 
data alone (Kozomara and Griffiths-Jones, 2014). There are also 
other types of sRNAs such as endogenous short interfering 
RNAs (endo-siRNAs) originating from long double-stranded 
RNAs (dsRNAs) generated by transposable elements or conver- 
gent transcription units (Kim et al., 2009). Also sRNAs from tran- 
script starts and ends have been described (Seila et al., 2008; 
Valen et al., 201 1 ; Zamudio et al., 2014). Recent findings highlight 
that primary sequence, RNA structure, size, and position of stem 
and loop, the accessibility of 375' ends, and cofactors are all rele- 
vant for Dicer substrate recognition and/or cleavage (Feng et al., 
2012; Fukunaga et al., 2012; Gu et al., 2012; Lau et al., 2012; 
Park et al., 201 1 ; Tian et al., 2014). Furthermore, in vitro studies 
(Feng et al., 2012) and electron microscopy (Taylor et al., 2013) 
show that affinity and cleavage efficiency can be uncoupled prop- 
erties of the bound substrate. In C. elegans, Dicer binds the 
lncRNAmcs-7 in vitro but does not process it to sRNA, leading 
to sequestration of Dicer and inhibition of its function (Hellwig 
and Bass, 2008). 

Other reports likewise point to functions of Dicer that are not 
connected to sRNA production. In human cells. Dicer is present 
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Figure 1. PAR-CLIP Identifies Transciptome-wide Binding of Dicer 

(A) Outline of experimental setup. PAR-CLIP/iPAR-CLIP of Dicer was per- 
formed in HEK293 cells and C. elegans. Expressed sRNA was profiled by 5'P 
(C. elegans) or AG02/3-IP (human) deep sequencing. The impact of Dicer and 
Drosha knockdowns on mRNA levels was monitored by mRNA-seq, on sRNA 
expression by sRNA-seq. 

(B and C) Overview of PAR-CLIP experiments in HEK293 cells (B) and in 
C. elegans (0). The right panel shows the phospho-image of SDS-PAGE- 
resolved, radiolabeled Dicer-RNA complex, immunoprecipitated from 4SU- 
labeled and crosslinked HEK293 cells (B) or worms (0). Arrows indicate the 
specific band of RNAs crosslinked to Dicer protein, confirmed by western blot 
(left). 



(“passive” sites). In C. elegans, the top passively bound tran- 
script was the IncRNA rncs-1 , confirming previous in vitro data 
(Hellwig and Bass, 2008) and indicating that passive sites can 
function by sequestering Dicer. Dicer knockdown experiments 
in both human cells and worms suggest that passive sites, in 
contrast to sites that emit sRNAs, are generally stabilized by 
Dicer. Furthermore, we systematically compared passive Dicer 
targets across human and worm and discovered that messages 
encoding P body/granule components are statistically signifi- 
cantly enriched in both species. 

In summary, our data reveal hundreds of additional miRNAs, 
non-canonical substrates for dicing, and a novel, passive mode 
of Dicer binding. Moreover, we identify a large number of stably 
bound “passive” sites inside of many mRNAs in both human 
and worm. We present multiple lines of evidence that these sites 
are bound by Dicer nearby the loop of small hairpin structures 
within the intact host RNA, generally stabilizing its expression. 
We discuss several scenarios for the function of passive sites. 

RESULTS 

PAR-CLIP Identifies Transcriptome-wide Binding 
of Dicer in Human Celis and C. elegans 

To identify direct in vivo Dicertargets transcriptome wide, we per- 
formed PAR-CLIP (Hafner et al., 201 0) in human HEK293 cells and 
iPAR-CLIP (Jungkamp et al., 2011) in young adult C. elegans 
worms (Figures IB and 1C). We used HEK293 cells expressing 
FLAG/HA-tagged DICER1 protein (Figure SI A available online), 
whereas in C. elegans, we used a 3xFLAG/10xHis-tagged 
DCR-1 rescue strain (Figure SI B). For HEK293 cells and worm, 
we combined three independent replicates each (see Figures 
SI C-S1 F for reproducibility). After computational quality filtering, 
we identified ~8,500 and ~2,500 reproduced Dicer-binding sites 
in HEK293 cells and C. elegans, respectively. The crosslinking 
introduced specific nucleotide mutations and was not biased to- 
ward highly expressed genes (Figures SI G and SI H). 



in the nucleus (Doyle et al., 2013; Sinkkonen et al., 2010) and in- 
teracts with nuclear pore complexes (Ando et al., 2011). In 
C. elegans. Dicer localizes to germline P granules and is required 
for their formation (Beshore et al., 2011). Dicer is also present in 
chromatoid bodies of mammalian germ cells (Kotaja et al., 2006). 
If Dicer binding and RNA cleavage are largely uncoupled, many 
interactions may be undetectable with current methods. 

Here, we present an in-depth identification and analysis of 
direct Dicer-binding sites in human embryonic kidney (HEK) 
293 cells and C. elegans by applying PAR-CLIP/iPAR-CLIP (Haf- 
ner et al., 201 0; Jungkamp et al., 201 1) (Figure 1). Because mam- 
mals and nematodes both have a single full-length Dicer gene, 
assumed to process long dsRNAs and miRNAs, we were able 
to compare human and worm modes of Dicer targeting, high- 
lighting core functions conserved during 550 million years of evo- 
lution. To measure the enzymatic activity of Dicer on its binding 
sites, we performed sRNA sequencing. In human cells, we 
deeply sequenced AGO-loaded sRNAs, enabling us to identify 
hundreds of additional miRNAs. 

We also identified in both human and worm thousands of sta- 
bly bound Dicer target sites that were not processed into sRNAs 



Dicer PAR-CLIP Recovers Known miRNAs 

We first investigated canonical miRNAs, exemplified by the hu- 
man “oncomir” cluster miR-17-92 (Figure 2A) and the miR-35- 
41 cluster in C. elegans (Figure 2B). All PAR-CLIP experiments 
reproducibly mapped Dicer binding (Figure 2A and 2B). PAR- 
CLIP reads and crosslink-induced T-to-C mutations cover the 
entire pre-miRNA stem-loop structure. AGO-bound sRNA reads 
align to both sides of the stem and form precise double 
stacks, representing the “mature” and “star” miRNA. Additional 
~1 1 nt of base-paired RNA beyond the pre-miRNA reflect the re- 
quirements for Drosha processing (Han et al., 2006) of miR-17 
and miR-40 (Figures 2A and 2B, inserts). 

In contrast to sRNA sequencing. Dicer PAR-CLIP abundantly 
yielded reads spanning the loop region of miRNAs. This was 
further enhanced when the RNase T1 treatment was omitted 
(Figure S2A and S2B). Thus, our PAR-CLIP assay faithfully re- 
covers Dicer binding to intact miRNA precursors. 

To our surprise. Dicer binds not only pre-miRNAs. We 
mapped Dicer-binding sites to GENCODE, WormBase in 
C. elegans, and miRBase annotations. Only 321 binding sites 
map to 316 individual miRNAs in HEK293 cells and 100 binding 



1154 Cell 159, 1153-1167, November 20, 2014 ©2014 Elsevier Inc. 




Cell 



sites to 92 miRNAs in C. elegans. The majority of Dicer-binding 
sites map to a wide range of RNA transcripts, including not 
only other structural RNAs, such as tRNAs and snoRNAs, but 
also exonic and intronic mRNA regions (Figures 2C and 2D and 
Table S1). This is also reflected in the distribution of PAR-CLIP 
reads (Figures S2C and S2D). 

Dicer-Binding Sites Expiain the Majority of Expressed 
sRNA 

We asked how many of the expressed sRNAs map to Dicer- 
binding sites. We sequenced AGO-loaded sRNAs in HEK293 
cells by immunoprecipitation of stably expressed FLAG/HA- 
tagged AG02 and AG03. The two data sets were highly corre- 
lated (Figure S2E) and subsequently merged (referred to as 
AGOIP). AGOIP read counts were also highly correlated to 
normal 5'-monophosphate-specific (5'-P) sRNA sequencing, 
indicating that the majority of expressed sRNAs are indeed 
loaded into AGO (Figure S2F). Of the aggregate number of 
AGOIP reads from 18-26 nt length, 95% originate from human 
Dicer-binding sites, comparable to the 93% explained by miR- 
Base (Figure 2E). 

C. elegans expresses at least 27 Argonautes (Gu et al., 2009). 
We therefore relied on conventional sRNA sequencing. Flere, 
61% of 18-26 nt reads originate from nematode Dicer-binding 
sites, comparable to 60% explained by miRBase (Figure 2F). 
The previously published PAR-CLIP data sets of single-stranded 
RNA-binding proteins, human ELAVL1 (Lebedeva et al., 2011) 
and C. elegans GLD-1 (Jungkamp et al., 2011), were analyzed 
in the same manner as negative controls and explained less 
than 1 % of the expressed sRNA. Worm-specific, triphosphory- 
lated endo-siRNAs were sequenced independently and were 
found to be largely unassociated with Dicer-binding sites, 
consistent with previous results on their Dicer-independent pro- 
duction (Gu et al., 2009) (Figure S2G). 

Dicer PAR-CLIP Identifies sRNA-Generating Loci with 
High Sensitivity 

Because the pool of expressed sRNAs can be dominated by a 
few highly expressed miRNAs, we investigated the sensitivity 
of the Dicer PAR-CLIP as a function of sRNA expression. We 
considered all clusters of AGOIP reads that uniquely align to 
the genome as potential sites of sRNA production and selected 
those with a given minimal sRNA output. We then asked what 
fraction of these stacks overlap Dicer-binding sites, known 
miRNAs, or control sites (Figures 2G and 2H). Although Dicer- 
binding sites more often overlap with regions of low and interme- 
diate sRNA expression than the miRBase annotation, the gap 
disappears for highly expressed sRNAs. Thus, although compa- 
rable fractions of total sRNA can be explained by either miRBase 
or Dicer binding. Dicer PAR-CLIP offers enhanced sensitivity for 
lowly expressed sRNAs. 

We conclude that Dicer PAR-CLIP identifies loci of sRNA pro- 
duction with high sensitivity and that the majority of AGO-loaded 
or 5'-P sRNAs (at least 95% in HEK cells and at least 61 % in 
C. elegans) are probably Dicer products. Furthermore, the con- 
trol data sets for ELAVL1 and GLD-1 demonstrate that our anal- 
ysis of HEK293 and worm PAR-CLIP data has a very low rate 
(<1 %) of falsely reporting sites that generate sRNAs. 



Dicer PAR-CLIP Discovers Hundreds of 
Additional miRNAs 

Next, we searched for previously unidentified miRNAs in the Dicer 
PAR-CLIP data. We identified 1 ,978 human DICER1 -binding sites 
with AGOIP read counts above background level (Figure S3A). Of 
these, 1,678 are not listed in miRBase (Figure S3A). We ran 
miRDeep2 (Friedlander et al., 2012) to predict additional miRNAs 
from the AGOIP data independently and subsequently inter- 
sected these with DICER1 -binding sites. This analysis resulted 
in 212 (not present in miRBase) miRNA candidates supported 
by Dicer PAR-CLIP (Table S2). However, we also found many 
Dicer-bound loci with a clean “double-stack” profile of AGOIP 
reads, the signature of processed pre-miRNAs, which were only 
called by miRDeep2 after deactivating the hairpin scoring (Table 
S2). This scoring is meant to assess how likely the hairpin fold of a 
sequence occurred by chance, which reduces false positives but 
also sensitivity. We successfully validated Dicer-dependent pro- 
cessing for three out of three such miRNAs by an in vitro process- 
ing assay (Figure S3B), demonstrating that the PAR-CLIP 
evidence for in vivo interactions allows us to relax the miRDeep2 
filtering. In total, we report 367 new miRNA candidates predomi- 
nantly originating from introns (like known miRNAs), but also from 
a variety of other sources including 5' UTRs (Figures 3A and 3B). 

In C. elegans, only two sites were identified as putative addi- 
tional miRNAs (Figures S3C and S3D), which is not surprising 
considering the very deep sRNA profiling and smaller genome 
(Shiet al.,2013). 

Additionally Identified miRNAs Are Dicer Dependent and 
Interact with mRNAs 

To validate the miRNA candidates, not present in miRBase, we 
compared sRNA read counts from control cells to Dicer- or Dro- 
sha-depleted cells (Figures S3E-S3G). Our 5'-P sequencing data 
allowed us to quantify expression changes of 52 miRNA candi- 
dates, which were specifically and highly significantly downregu- 
lated upon Dicer knockdown (Figures 3C, S3H, and S3I). 

Next, we screened publicly available AGO CLIP data for 
chimeric reads (Grosswendt et al., 2014) that would support 
direct miRNA:target interactions. For 20 of our miRNA candi- 
dates, chimeric reads with target 3' UTRs were found (Table 
S2). Although some indicate binding through seed complemen- 
tarity, others show noncanonical miRNA:target interactions. For 
example, the miRNA derived from the 5' UTR of glutamate- 
ammonia ligase (GLUL) appears to bind to the 3' UTR of methyl- 
transferase-like 8 (METTL8) by base-pairing within the central 
region of the miRNA (Figure 3E). 

We further validated four out of four candidates by northern 
analysis. Dicer dependence is reflected either by reduction of 
mature miRNA or by accumulation of pre-miRNA in Dicer-depleted 
cells (Figure 3D). Finally, we employed a luciferase reporter with 
perfect matches to miRNA candidates. By this assay, three out 
of six candidates can regulate targets in vivo (Figure 3F). In sum- 
mary, multiple independent assays support expression. Dicer de- 
pendency, and functionality of the additionally identified miRNAs. 

Dicer Binds and Cleaves Structural RNAs 

Known and additional miRNAs still account for only a fraction of 
the Dicer targets with sRNA output. We call these sites “active” 
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Figure 2. PAR-CLIP Recovers Known miRNAs and Explains the Majority of Expressed sRNA 

(A and B) Dicer PAR-CLIP data for miRNA clusters: human mir-17-92 (A) and C.elegans mir-35-41 (B). PAR-CLIP-binding sites from three independent ex- 
periments are indicated as gray boxes. Consensus clusters in blue (human) or green (nematode). Human AG02/3-IP sRNA reads and C. elegans 5'-P sRNA read 
coverage in black. The top insert shows details of the folded pre-miRNA secondary structure. PAR-CLIP read coverage in blue (human) and green (C. elegans). 
Crosslinks in yellow to red, proportional to induced nucleotide conversion frequency. sRNA coverage as black lines, with width indicating expression level. 

(legend continued on next page) 
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(at least 10 AGOIP or sRNA reads) or “highly active” (human: 
>1,000 AGOIP reads, C. elegans: >100 sRNA reads) (Figures 
S4A and S4B). Figure 4A shows active Dicer binding to the H/ 
ACA box snoRNA SNORA33, the 3' splice site of an alternative 
exon of Drosha mRNA, and the 5' UTR of glutamate-ammonia 
ligase (GLUL) (Figure S4C shows greater detail). All of these sites 
give rise to ample sRNAs, comparable to a medium-expressed 
miRNA. The C. elegans examples of atypical substrates include 
the C/D box snoRNA F30H5.4 and Y-RNAym-7 (Figure S4D), 
where cleavage may generate different RNA fragments of ~50 
nt length. Cleavage appears to frequently occur only on one 
side of the stem, suggesting that Dicer may primarily participate 
in its degradation. Expression of Dicer-bound structural RNAs 
indeed increased upon Dicer depletion, as shown by qPCR for 
tRNAs, snoRNAs, vault RNAs, or Y-RNAs in human (Figure S4E) 
and C.elegans (Figures S4F and S4G), independent of whether 
sRNA profiles match a clean double-stack profile that would 
resemble pre-miRNA-like processing. 

Many Abundant sRNAs Are Processed with Low 
Precision 

To study sRNA production for these substrates, we first compu- 
tationally scanned binding sites for hairpin structures that could 
explain Dicer binding. These structures set the frame of refer- 
ence to count sRNAs that align to both hairpin arms, to assess 
possible preferences for 5' or 3' arms, and to score the fidelity 
of sRNA processing by recording 5' variability of read stacks 
(Figure 4B and Experimental Procedures). The in silico hairpin 
folds are supported by PARS data, which captures secondary 
structures in vivo (Wan et al., 2014; Figure S4H). 

The resulting picture reveals a large dynamic range (six orders 
of magnitude) of sRNA expression generated by diverse sub- 
strates (Figures 4C and S4I). The most abundant sRNAs origi- 
nate from known miRNAs, whereas Dicer binding to mRNAs, 
on average (but with interesting exceptions like the 5' UTRs of 
GLUL or FLCN), gives rise to very little or no detectable sRNA. 
Expression of sRNAs from mRNAs did not correlate with 
mRNA expression (Figure S4J). 

As reported before (Chiang et al., 2010), the more abundant 
sRNA preferentially arises from the 5' arm of the hairpin. This 
holds for known miRNAs and becomes more pronounced for 
the entire set of active sites (Figure 4D). Interestingly, expression 
level of sRNAs does not automatically correlate with processing 
fidelity (Figure 4E). Although known miRNA precursors are not 
only abundant, but also cut with the highest precision, the 
additional miRNAs and a subset of bound tRNAs and snoRNAs 
identified here are processed with comparable precision despite 
lower sRNA expression. On the other hand, abundant sRNAs 
from mitochondria are apparently produced with very low preci- 
sion (Figures 4F and S4N). 

There are examples of Dicer-bound structural RNAs, mostly 
tRNAs, which show clean stacks of AGO-loaded RNAs, but addi- 



tional and more diverse sRNA profiles in 5' P sequencing (Fig- 
ure 4F). Furthermore, upon Dicer knockdown, the sRNA output 
from most tRNAs is not substantially reduced (Figure S4K), 
whereas mitochondrial sRNAs even increase in abundance (Fig- 
ures 4G and S4K). Taken together. Dicer binding to structural 
RNAs influences RNA stability, but concomitant sRNA produc- 
tion does not automatically indicate that dicing is also the source 
of the observed sRNA. Rather, abundant RNA species may be 
subject to multiple, different mechanisms of decay, including 
hydrolysis. 

Mitochondrial Transcripts Are Regulated by Human 
and C. elegans Dicer 

Because many Dicer-binding sites map to mitochondrial tran- 
scripts but display low fidelity of sRNA production, we examined 
the expression changes of mitochondrial transcripts upon Dicer 
depletion by qPCR. We reproducibly observed increased mito- 
chondrial transcript levels after 2 days of DICER1 depletion in 
HEK293 cells (Figure 4H). The effect is even more prominent af- 
ter 3 days (Figure S4L), which was confirmed by northern anal- 
ysis (Figure S4M) and was similarly observed in DCR-1 RNAi 
worms (Figures S40 and S4P). 

Dicer Is Associated with sRNAs Derived from 
Transcription Start Sites 

As described before (Seila et al., 2008; Valen et al., 201 1 ; Zamu- 
dio et al., 2014), some sRNAs originate from transcription start 
sites (TSS). We observe concomitant DICER1 binding around 
TSS of protein-coding genes (Figure S4Q), suggesting that pro- 
moter-associated sRNAs from both strands are, at least in part, 
DICER1 associated. In contrast, the AGOIP peak upstream of 
transcription termination and polyadenylation sites (PAS) does 
not coincide with Dicer binding (Figure S4R). This is consistent 
with recently published data describing PAS-associated sRNA 
as Dicer independent (Valen et al., 2011). 

Reproducible Dicer Binding without Detectable sRNA 
Production 

Although we were able to associate 86% of human Dicer-binding 
sites and 72% of C. elegans sites with local stem-loop folds, the 
majority of these Dicer-bound hairpins (73.4% and 84.5%) ap- 
peared to not give rise to detectable sRNA. These 5,349 human 
and 1 ,510 C. elegans “passive” sites are particularly enriched in 
coding sequences and 3' UTRs (Figures 5A and 5B). 

Although we already demand that all binding sites are 
supported by at least two out of three independent PAR-CLIP 
replicates, we wanted to ensure that the passive sites indeed 
represent stable interactions with mRNAs. To this end, we per- 
formed independent Dicer immunoprecipitations without cross- 
linking or RNase treatment and assayed the bound RNAs by 
semiquantitative RT-PCR. In each case, the amplicons were 
set outside of the identified Dicer-binding sites, and the RT 



(C and D) Annotation breakdown of binding sites in human (C) and C. elegans (D). 

(E and F) Barplots of size and 5' nt distribution of sRNA reads. Top haif, ieft to right: sRNA originating from Dicer-binding sites, miRBase miRNAs (pos. controi), and 
ELAVL1/GLD1 -binding sites (negative controi). Bottom haif: sRNA from remaining transcribed parts of the genome. (E) human, (F) C. elegans. 

(G and H) Barplot showing the fraction of sRNA read stacks that overiap with Dicer-binding sites (biack), miRBase miRNAs (dark gray), or EI_AVL1/GLD-1 -binding 
sites as controis (bright gray), as a function of minimai sRNA read count in human (G) and C. elegans (H). 
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Figure 3. Identification and Validation of miRNA Targets 

(A) Annotation breakdown of identified (top) and known miRBase (bottom) miRNAs. 

(B) Examples of identified miRNAs of different human genomic origin. Reduced miRDeep2 plots show the precursor hairpin structure and the coverage of mature 
(red), star (violet), and loop (yellow) sequences. 

(C) Cumulative distribution of changes in small RNA expression upon Dicer knockdown (siRNA 2). The distribution of log2 fold changes of small RNA read counts 
of tRNAs (gray), snoRNAs, and snRNAs (green), known miRNAs (dark blue), and miRNAs identified in this analysis (blue, dashed). Known (p < 1 x10“®®, Mann- 
Whitney U, n = 512) and newly identified (p <1x10“^®, MWU, n = 52) are significantly more downregulated after Dicer depletion than tRNAs (n = 513). 

(D) Northern blots of human let-7a (pos. control) and four miRNAs validate the ~22 nt mature product and either reduction of mature (let-7a, GLUL) or increase of 
precursor (AURKB, SNORA56, SNORA32) upon Dicer-depletion. 

(E) Chimeric reads from AGO-CLIPs, containing miRNAs (top) and target fragments (below, fragments are extended; original chimeric part is bold) support in vivo 
interaction. Vertical bar: Watson-Crick pairing; colon: G-U wobble. Dashes indicate bulged nucleotides in the paired sequence. 

(F) Luciferase reporter assay with perfectly matching (PM) or mismatched (MM) binding sites controlling Renilla expression. Barplot: Ratios of Renilla to firefly 
(control). Error bars indicate SD. 



reaction was oligo-dT primed to specifically test for binding to 
intact mRNAs. By this assay, nine out often passive Dicer-bind- 
ing events were validated in both human and worm (Figures 5C, 
5D, S5A, and S5B), also with an antibody against endogenous 
DICER1 (Figure S5A). 



In C. elegans, the most strongly bound passive target of 
DCR-1 in vivo is rncs-1 (Figure 5E). This IncRNA forms a long 
double strand that was previously described to bind DCR-1 
with high affinity in vitro while being resistant to dicing due to 
secondary structures at the ends of the long stem (Hellwig and 
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Bass, 2008). The absence of abundant sRNAs from rncs-1 
argues for the absence of efficient dicing, consistent with the 
proposed role of rncs-1 as a stable competitor for other Dicer 
substrates (Hellwig and Bass, 2008). 

The transcript with the largest number of passive DICER1- 
binding sites in human is the mRNA encoding DICER1 itself. It 
is covered with 39 PAR-CLIP clusters, but like rncs-1 in worm, 
these sites do not emit sRNAs (Figure 5F). Another strongly 
bound example is a binding site in the 3' UTR of vesicle-associ- 
ated membrane protein-associated protein B/C (VAPB) (Fig- 
ure 5G). The sites constitute imperfect stem-loop structures, 
similar to pre-miRNAs, but with larger bulges and lacking the 
distinct 1 1 nt stem segment that would be required for Drosha 
processing. 

Passive Sites Can Interfere with Cataiytic Dicer Activity 

As rncs- 1 expression was shown to inhibit the catalytic activity of 
DCR-1 via sequestration of DCR-1 (Hellwig and Bass, 2008), we 
tested whether this is also possible with human passive sites. 

We expressed four mRNA transcripts containing passive hair- 
pins of different origin (DICER1, SLC2A1, TARBP2, VAPB) to 
high levels (in the range of GAPDH) in HEK293 cells (Figure S5C). 
Expression of four out of four passive hairpins reproducibly 
reduced the levels of endogenously expressed miRNAs by 
20%-30% in comparison to constructs bearing mutations that 
disrupt the secondary structure (Figure 5H), suggesting that 
these interactions are sufficiently strong to functionally 
sequester Dicer protein in vivo. Dicer protein levels were not 
affected (Figure S5C). Taken together, our data show that Dicer 
stably and reproducibly binds to specific stem-loop structures 
inside of intact mRNAs without dicing. 

Passive Dicer Targets Are Functionally Linked to RNA 
Granules 

Mammalian and C. elegans Dicer is essential for the germline 
(Knight and Bass, 2001 ; Murchison et al., 2007). A hallmark of an- 
imal germ cells are RNA granules, perinuclear aggregates of 
RNA and protein. In C. elegans, Dicer is present in and required 
for assembly of these granules (Figure S5D and Beshore et al., 
2011). As Dicer is a large protein with many protein interactors 
and passive binding to mRNAs does not involve RNase activity, 
we hypothesized that Dicer may additionally play a role in aggre- 
gating bound mRNAs to RNA granules. We intersected lists of 
genes known to be required for the proper formation of RNA 
granules with Dicer targets. To qualify as a passive Dicer target, 
we conservatively demand that a gene must not contain a single 
Dicer-binding site with sRNA output exceeding background 
levels (Experimental Procedures). We find that both human 
(p < 0.01 5 Fisher’s exact test) and worm (p < 0.003 for P bodies, 
p < 2.3 X 1 0“® for P granules) passive target transcripts of Dicer 
are statistically significantly enriched for genes associated with 
granules (Updike et al., 2011) (Figures 51 and 5J). For many of 
these genes, it has been demonstrated that both, their mRNAs 
and proteins, are localized to granules (Schisa et al., 2001). Of 
note, the enrichment of P-body-associated genes among Dicer 
targets in HEK293 cells is almost entirely due to passive targets 
and becomes even more significant for the most strongly bound 
passive targets with most crosslink events. In contrast, active 



targets are not enriched for granules. The association of passive 
Dicer binding with RNA granules suggests a role in mRNA local- 
ization. Indeed, our FLAG-tagged DCR-1 protein recapitulates 
the granular localization of wild-type DCR-1 in the C. elegans 
germline (Figures 5K and S5E and Beshore et al., 2011). 

Human passive targets of Dicer are also significantly (p < 2.8 x 
10“^) enriched for 3' UTR targets of Staufen (Ricci et al., 2014), 
presumably due to the presence of double-stranded structures 
in their common targets. However, whereas Staufen binds to 
long stems (Ricci et al., 2014), Dicer binding in HEK293 cells 
seems restricted to the tip of the structure and requires contact 
to the loop region of hairpins (Figure S6B). 

Homologous, Passive Targets in Human and C. elegans 

We find 82 genes with homology between human and worm 
to be passive Dicer targets in both species. An interesting 
example is the 3' UTR of the germline helicase CGH-1 (Figure 5L): 
the human ortholog DDX6 is also bound by DICER1 via its 3' 
UTR (Figure 5M). Although the comparison of an animal and 
a human cell line across ~550 Mya of evolution does not 
warrant to judge conservation of function, we find the prominent 
appearance of ancient genes linked to RNA granules and the 
germline intriguing and report passive, homologous targets in 
Table S3. 

Active and Passive Sites Differ in Secondary Structure 

To characterize the differences between active and passive 
sites, we grouped Dicer sites by their activity (sRNA output level). 
Taking the center of the loop as a point of reference, we 
computed average profiles of structure, crosslinking, and other 
features as a function of position within the stem loop. 

Known miRNAs have a characteristic fold with an unpaired 
loop region, flanked by paired bases within the stem of the 
pre-miRNA. Importantly, separated by a bulge, the stem extends 
for another ~1 1 nt beyond the pre-miRNA boundaries, which al- 
lows processing by DROSHA/DGCR8 (Han et al., 2006). This 
profile is observed for active sites (Figures 6A and 6B), including 
the miRNA candidates identified in our analysis, and is less pro- 
nounced for sites with lower activity. Among the active sites, 
base-pairing stability correlates with sRNA output, which, how- 
ever, appears to be optimal for intermediate folding energies 
(Figure S6C). In contrast, base-pairing within passive sites de- 
cays steadily to background levels, beyond ~1 5 bp of stem. Fig- 
ures S6D-S6G show additional profiles of G/C content and 
bulges. 

Passive Sites Preferentiaily Crosslink within the Loop, 
Active Sites at the Ends of Precursors 

If passive sites are indeed within intact mRNA transcripts, in 
contrast to pre-miRNA-like hairpins, no 5' and 3' ends should 
be accessible to Dicer. This difference could be reflected by 
the frequency of crosslink-induced transitions along the stem- 
loop structure because crosslinking requires close spatial prox- 
imity (Hafner et al., 2010). 

Crosslinks between Dicer and HEK293-expressed miRNAs 
peak in three distinct regions: within the loop and at the 5' and 
3' ends of the pre-miRNA (Figure 6C). The loop can interact 
with the Dicer helicase domain (Taylor et al., 2013), whereas 
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the ends are coordinated by the 5' pocket and PAZ domain, 
which is required for “setting the ruler” and efficient dicing (Ma- 
cRae et al., 2007; Park et al., 2011). This pattern is observed for 
all active sites, supporting that active sites are bound as pre- 
miRNA-like structures cleaved out of larger transcripts. In 
contrast, passive sites lack the peaks at precursor-size distance 
(Figure 6C). In C. elegans, the same difference between active 
and passive sites at the precursor termini is observed. However, 
active sites display much less crosslinking within the loop region 
(Figure 6D), consistent with the dispensability of the helicase 
domain for miRNA production in C. elegans (Welker et al., 2011). 

Passive Sites Are Not Engaged by DGCR8 and Are Not 
Cleaved 

For humans, we additionally interrogated DGCR8 HITS-CLIP 
data (Macias et al., 2012) and 5'-monophosphate sequencing 
of endonucleolytic RNA cleavage products (Karginov et al., 
2010). Active sites and known miRNAs display prominent shoul- 
ders of DGCR8 HITS-CLIP signal, flanking the pre-miRNA and 
indicating processing by the Drosha/DGCR8 complex. This 
signal is absent from passive sites (Figure 6E). Moreover, binding 
of the Dicer-interacting proteins PACT and TARBP2 (Goodarzi 
et al., 201 4) was only detected for loci with sRNA output (Figures 
S6H-S6J). Consistently, only active sites, but not passive sites, 
are supported by cleavage products aligning to either the pre- 
miRNA itself or the 3' fragments of the primary transcript. 

We conclude that the passive Dicer-binding sites observed in 
human and nematode represent local stem-loop structures, 
which structurally differ from typical miRNAs and are conse- 
quently not cleaved by Drosha or Dicer. Rather, they constitute 
structural elements within intact transcripts. 

Functional Consequences of Dicer Binding and Dicing 

To assess the significance and biological consequences of 
widespread Dicer binding to mRNAs, we performed Dicer 
knockdowns in HEK293 cells and C. elegans, followed by 
mRNA sequencing. To limit indirect effects due to reduced 
miRNA production, we chose the earliest time point at which 
Dicer protein levels were significantly reduced while miRNA 
levels were stable. As an additional control, we knocked down 
Drosha (Figures S7A and S7C-S7F). 



In C. elegans, miRNA expression changes during develop- 
ment, and Dicer depletion causes pleiotropic effects, including 
sterility. Our best attempt to study the impact of Dicer binding 
on mRNAs therefore consists of choosing a developmental stage 
when Dicer can be significantly depleted (Figures S7B, S7G, and 
S7H) while the animals are viable. We used the fern- 1 (hc17) strain 
(inducible sterility) to obtain sterile control animals. 

As the sequencing-derived log fold changes correlated well with 
qPCR measurements on independent biological replicates (Fig- 
ure S7M), we compared the distribution of changes between 
different groups of transcripts. Overall, the observed changes at 
early time points are small. However, in both HEK293 cells and 
C. elegans, we observed statistically significant effects on Dicer 
target transcripts. mRNAs containing active Dicer-binding sites 
were significantly stabilized upon Dicer depletion, whereas pas- 
sive targets overall were more destabilized in both worms and cells 
(see Figure S7N for mRNA decay). The most strongly bound pas- 
sive targets, as well as the homologous passive targets of DICER1 
and DCR-1 , show the strongest effects (Figures 7A, S7I, and 7B). 

In contrast, Drosha depletion had almost no effect on passive 
targets in HEK293 cells (Figures S7J and S7K), ruling out that the 
observed regulation stems from global loss of miRNAs. In 
C. elegans, where results strongly depended on the develop- 
mental stage, Drosha depletion stabilized both passive and 
active targets (Figure S7L). 

DISCUSSION 

The rapid increase of sequencing data has led to the discovery of 
many sRNAs that have been linked to Dicer activity. However, 
sRNA data capture only the endpoint of a cascade of processing 
events on a background of degradation products, and the indi- 
rect identification of Dicer binding from sRNA requires assump- 
tions about miRNA-like processing. Here, we biochemically 
identified thousands of in vivo Dicer-binding sites in a human 
cell line and C. elegans. 

Although we recovered known miRNAs with high sensitivity, 
these account for only a fraction of our data. We identified and 
validated many additional miRNAs. Most of these are lowly ex- 
pressed and would not have been detected without direct evi- 
dence for Dicer binding. 



Figure 4. Dicer-Binding Sites in Different RNA Substrates Display a Wide Dynamic Range of sRNA Expression and Processing Fidelity 

(A-F) Dicer binding to noncanonicai structurai RNA: snoRNA SNORA33, insert shows structure as in Figure 2A, 3' spiice site of DROSHA mRNA, 5' UTR of GLUL 
mRNA in HEK293 ceiis. Dark biue: exon/intron structure of bound transcript; iight biue: Dicer-binding site, AGOiP coverage in biack. 

(B) Schematic of the iocai hairpin search. Binding sites (dark gray) are padded with flanking sequence (light gray) and are folded with RNAsubopt. The best hairpin 
is selected and intersected with aligned AGOlP/sRNA reads. Numbers and positions of alignments used to score output, left/right arm asymmetry, and fidelity of 
sRNA processing are shown. 

(C) Cumulative frequency of AGOIP read counts for different categories of Dicer-binding sites in HEK293 cells. 

(D) Small RNA production is biased toward the left arm. Histogram of sRNA read coverage asymmetry across all sRNA precursors in human (top) and C. elegans 
(bottom). Dicer-bound active hairpins (bright blue and green, respectively) and mlRBase annotated human and C. elegans pre-miRNAs (dark gray). 

(E) Cumulative frequency of AGOIP read stack fidelity (0 for randomly scattered reads, 1 for a single 5' start position for all reads) for different categories of Dicer- 
binding sites in HEK293 cells. 

(F) Examples of AGO-loaded small RNA sources. Black: AGOIP reads, blue: 5'P sRNA control, purple: 5'P sRNA DICER1 KD2. Uniquely aligning reads 1 8-25 nt. 
Dashed lines: all reads. 

(G) Dicer (blue boxes) and neg. control OKI and ELAVL1 (gray boxes) binding along mitochondrial transcripts (middle, blue) from the plus strand (top) or minus 
strand (bottom). sRNA output in blue (control) and purple (DICER1 KD with siRNA2). Amplicons for qPCR are marked by rectangles. 

(H) Barplot with qPCR changes of mitochondrial transcripts upon Dicer KD. Error bars represent SEM. *p < 5%, **p < 1 %, two-sided, two-sample t test on delta 
Ct, unequal variance, n = 3. 
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Furthermore, the intersection of Dicer binding, sRNA 
sequencing, and AGOIP allowed us to investigate Dicer interac- 
tions with structural RNAs. We found not only unusual patterns of 
dicing, but also Dicer-independent, AGO-loaded sRNAs, pre- 
sumably generated by other modes of processing or degrada- 
tion. As Dicer-bound structural RNAs accumulated upon Dicer 
depletion (Figures S3F and S3G), regardless of the fidelity of their 
processing, this opens the possibility that, rather than producing 
trans-acWng mlRNAs, Dicer may contribute to or interfere with 
the degradation of many structural RNAs. Dicer is required for 
germline maintenance and early embryogenesis (Knight and 
Bass, 2001 ; Murchison et al., 2007), and its expression changes 
during differentiation. This observation could, in part, explain 
why the pool of tRNAs differs between proliferating and differen- 
tiated cells (Gingold et al., 2014). 

To our surprise, depletion of Dicer, on average, increased the 
levels of mature mRNAs hosting active sites, suggesting that 
there can be some form of feedback to DROSFIA/DGCR8 pro- 
cessing. Components of the sRNA pathway are known to 
auto- and cross-regulate each other (Bennasser et al., 2011; 
Flan et al., 2009), and many, including three AGOs, TARBP2, 
and DROSHA, are also targeted by DICER1 . Indeed, Dicer and 
Drosha levels were not independent in perturbation experiments. 
However, at the early time point in Figures 7A and 7B, Drosha 
levels were unaffected. 

Based on studies in nematodes, it has been suggested that 
long dsRNA represents the archetypal substrate for Dicer, linked 
to its supposed ancient role in defense against dsRNA viruses or 
genome-invading elements (Sarkies and Miska, 201 3). However, 
it appears that binding of human, somatic Dicer requires contact 
to the loop region, presumably utilizing the helicase domain, as 
we observe prominent crosslinking to hairpin loops. Indeed, 
we find almost no long dsRNA binding in HEK293 cells (Figures 
S4S, S6A, and SOB), in line with other findings that somatic 
mammalian cells lack endo-siRNA biogenesis (Nejepinska 
et al., 2012). 

Our unbiased look at Dicer-binding sites pictures a continuum 
of sRNA expression from diverse substrates and surprisingly en- 
compasses a large set of non-diced, passive binding sites that 
were previously undetectable. This class of passive binding 
sites, predominantly residing in mRNAs, can be stably bound 



without endonuclease activity. This observation agrees with pre- 
vious reports indicating that Dicer is not able to efficiently pro- 
cess RNA if no free ends are available (Fukunaga et al., 
2014;Zhang et al., 2002) and explains the resistance of the 
IncRNA mcs-7 against dicing (Hellwig and Bass, 2008). 

Our data argue that passive sites are not transient interactions 
and that they can stabilize expression of their host transcripts. 
We note that these effects were strongest for mRNAs that 
were most strongly crosslinked or are also passive targets in 
the other species. 

What could be the function of passive Dicer sites besides 
generally stabilizing target RNAs? First, we have shown that 
high expression of passive sites can sequester Dicer protein 
from other target transcripts and can interfere with its catalytic 
activity, analogous to lncRNAmcs-7 in C. elegans. In these ex- 
periments, we expressed a few thousand passive sites per 
cell. It is not trivial to estimate the total copy number of all passive 
sites, but according to our data, a high number of passive sites 
may be naturally present. These sites could have a buffering 
function and may compete with active sites for Dicer binding. 
Second, the proteins encoded by passively bound mRNAs can 
immediately interact with Dicer, as they are translated. This 
would be a mechanism for efficient buildup of protein com- 
plexes. In support of this idea, several mRNAs with passive sites 
both in human and worm encode known Dicer interactors 
(DRH-1, LIN-41, TARBP2, DHX9). Third, many passively bound 
mRNAs encode RNA-granule-associated genes, and we show 
that Dicer protein itself is granularly localized in the C. elegans 
germline. Taken together with the observation that Dicer is 
required for RNP granule formation (Figure S5D and Beshore 
et al., 2011), this indicates that passive binding may be important 
for RNA localization and assembly of RNA-protein complexes. 
This finding could explain the diverse although overall weak ef- 
fects observed upon DICER1 knockdown in HEK293 cells and 
suggests that germ cells, forming prominent RNP structures, 
are a more suitable system to investigate this aspect of Dicer 
biology. Along the same line, the fem1 strain of C. elegans forms 
large and prominent P granules in its arrested oocytes (Schisa 
et al., 2001), and the observed changes of passive targets 
upon DCR-1 depletion were more pronounced in these animals 
than in the wild-type (data not shown). Independently supporting 



Figure 5. Most mRNA Sites Are Not Substrates for Dicer Cleavage 

(A and B) Pie charts of genomic annotation of passive binding sites for human (A) and C. elegans (B). 

(C and D) Vaiidation of cataiyticaiiy passive Dicer targets by RiP RT-PCR. RT-PCR on RNA from Dicer iP vaiidated nine out often tested in HEK293 ceiis (C) and 
nine out of ten in C.elegans (D). RT-PCR for abundant GAPDH mRNA served as negative controis for human. Mouse igG iP and RT-PCR for three abundant 
transcripts: K02B12.7, NPL4.1 as negative controi in C. elegans. RT-PCR reaction primed with oiigo-dT primer and ampiicon set outside of PAR-CLiP-binding 
sites. *MEX-3 has iow PAR-CLiP coverage, is exciuded from the strict consensus iist, and was inciuded here for comparison. 

(E) C. elegans highiy doubie-stranded incRNA rncs-1 is denseiy bound, but not efficientiy cieaved by DCR-1 . Top: secondary structure as in Figures 1 D and 1 E 
with zoomed region to show crossiinks. Bottom: sense (antisense) aiigned distinct PAR-CLiP reads are indicated as green iines above (beiow) the gene structure. 
Shades of green for repiicates. Thick biue iines: exons, green boxes: consensus binding sites. sRNA read coverage of rncs-1 iocus in biack. 

(F and G) Exampies of passive Dicer-binding sites in HEK293 ceiis: DiCERI mRNA (F), 3' UTR of VAPB mRNA (G). Dicer binding sites: biue boxes. AGOiP sRNA 
reads in biack. Top inserts show predicted foiding of Dicer-bound hairpins overiaid with PAR-CLiP data as in Figures 2A and 2B. 

(H) Sponge assay. Barpiots with the ratio of miRNA ieveis in ceiis expressing passive hairpins reiative to mutated hairpins. Error bars represent SEM. *p < 5%, 
**p < 1 %, two-sided, two-sample t test, unequal variance, n = 3. 

(I and J) Barpiots of -loglO p value (one-sided Fisher’s exact test) for target enrichment of human (I) and C. elegans (J) Dicer. Vertical line demarcates p = 5%. 
(K) FLAG-tagged DCR-1 expression in C. elegans germline RNP granules. Immunohistochemistry was performed on extruded gonads from adult worms using 
monoclonal anti-FLAG (green) antibody. Overview and higher-magnification view of adult germline are provided (Figure S5E). Cell nuclei were counterstained with 
TO-PRO (blue). Scale bar, 10 i^m. 

(L and M) Passive binding sites in the homologous targets conserved germline helicase-1 (CGH-1 , C. elegans) and DEAD box RNA-helicase 6 (DDX6, human). 
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Figure 6. Secondary Structure and Dicer Crosslinking as a Function of sRNA Expression 

(A and B) Average base-pairing probability at each nucleotide position in fixed windows around the loop center for human DICER1 (A) and C.elegans DCR-1- 
binding sites (B). Catalytic activity levels correspond to sRNA read count: highly active >1 ,000 (>100), active >10, passive < 10 for human (C. elegans). ELAVL1/ 
GLD-1 (single-stranded) binding sites as negative, miRBase miRNAs as positive control. Arrows point at the additional 11 nt of stem, characteristic for DROSHA/ 
DGCR8 substrates. 

(C and D) Average occurrence of crosslink mutations at each nucleotide position in fixed windows around the loop for human DICER1 (C) and C. elegans DCR-1 
(D) bound hairpins. Activity levels as above. Arrows point at the peaks of crosslinking at the 5' and 3' ends of precursor structures, absent from passive sites. 

(legend continued on next page) 
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Figure 7. Functional Consequences of Dicer Binding and Dicing 

(A and B) Cumulative frequencies of mRNA log 2 fold changes for all detected genes (gray), catalytically active (orange/red), and passive targets (blue, most 
strongly bound: bright blue), as well as a conserved set of Dicer targets (black) in HEK293 cells (A) and C. elegans (B). Inserts: western blot validation of Dicer 
knockdowns. HEK293: active p = 3.95 x 10“"^ (n = 753), passive p = 1.586 x 10“^ (n = 2326), top passive p = 2.349 x 10“^ (n = 456), conserved passive p = 
4.337% (n = 78), all genes: n = 1 0,007. C. elegans: any active p = 2.287 x 1 0“^ (n = 1 07), active p = 1 .233% (n = 20), passive p = 1 .695 x 1 0“’’® (n = 804), top passive 
p = 7.412 x10“® (n = 165), conserved passive p = 4.269 x10“^ (n = 75), all genes: n = 8501. All p values by double-sided Mann-Whitney U. 

(C) Model of Dicer-RNA interactions and function. 



the idea that passive Dicer binding may influence mRNA localiza- 
tion, Dicer was recently found to be a shuttling protein (Doyle 
et al., 2013), and both human and C. elegans Dicer interact 
with nuclear pore complexes (Ando et al., 2011). 

In summary, we mapped transcriptome-wide Dicer binding 
in human and C. elegans, identified and validated numerous 



additional miRNAs, disentangled the contribution of Dicer to 
the degradation of many structural RNAs, and unveiled passive 
binding of Dicer with possible functions outside of sRNA path- 
ways (Figure 7C). Also, the microprocessor components 
DGCR8/Pasha and DROSHA have recently been shown to 
have miRNA-independent cellular functions (Gromak et al.. 



(E) Average presence of at least one DGCR8 HITS-CLIP (Macias et al., 2012) read for each nucleotide position in fixed windows around the loop for human 
DICER1 -bound hairpins. Activity levels as above. Arrows point at the additional shoulders at the base, outside of pre-miRNAs, absent from passive sites. 

(F) Average presence of at least one endocleaved 5'-P bearing sequencing read (Karginov et al., 2010) for each nucleotide position in fixed windows around the 
loop for human DICER1 -bound hairpins. Activity levels as above. Drosha/DGCR8 processing produces one peak starting at the 5' end of the pre-miRNA and one 
at the remaining 3' fragment. Both are absent from passive sites. 
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2013; Macias et al., 2012). Altogether, this indicates that the 
versatile functions of these ancient RBPs may have been over- 
shadowed by a focus on miRNAs and may need to be carefully 
re-examined. 

EXPERIMENTAL PROCEDURES 

Standard molecular biology techniques and lists of reagents as well as compu- 
tational analyses are described in the Extended Experimental Procedures. 

PAR-CLIP and iPAR-CLIP HEK293 Cells 

PAR-CLIP on HEK293 cells stably expressing FLAG/HA-DICER1 was per- 
formed as described (Hafner et al., 2010). In vivo PAR-CLIP was performed 
on DCR-1::FLAG rescue strain (BB92; dcr-1(ok247)lll,uuEx18) as described 
previously (Jungkamp et al., 2011). 

AGO-Associated sRNA Cloning and Sequencing 

Immunoprecipitation of FLAG/HA-tagged AG02/3 cells was performed with 
FLAG magnetic beads. RNA was isolated, ligated to 3' and 5' adapters, 
reverse transcribed, and PCR amplified and sequenced. 

sRNA Sequencing 

sRNA sequencing was performed from 5 |ag total RNA according to the stan- 
dard lllumina sRNA library preparation protocol. For 5' triphosphate sRNA 
sequencing, RNA was treated with 5' polyphosphatase before library 
preparation. 

ACCESSION NUMBERS 

Sequencing data have been deposited in the GEO database under GSE55333. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and four tables and can be found with this article online at http://dx.doi. 
org/1 0. 1 01 6/j.cell.201 4.1 0.040. 
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SUMMARY 

The fungal meningitis pathogen Cryptococcus neo- 
formans is a central driver of mortality in HIV/AIDS. 
We report a genome-scale chemical genetic data 
map for this pathogen that quantifies the impact of 
439 small-molecule challenges on 1 ,448 gene knock- 
outs. We identified chemical phenotypes for 83% of 
mutants screened and at least one genetic response 
for each compound. C. neoformans chemical-genetic 
responses are largely distinct from orthologous pub- 
lished profiles ot Saccharomyces cerevisiae, demon- 
strating the importance of pathogen-centered 
studies. We used the chemical-genetic matrix to pre- 
dict novel pathogenicity genes, infer compound 
mode of action, and to develop an algorithm, 02M, 
that predicts antifungal synergies. These predictions 
were experimentally validated, thereby identifying 
virulence genes, a molecule that triggers G2/M arrest 
and inhibits the Cdc25 phosphatase, and many com- 
pounds that synergize with the antifungal drug flu- 
conazole. Our work establishes a chemical-genetic 
foundation for approaching an infection responsible 
for greater than one-third of AIDS-related deaths. 

INTRODUCTION 

Invasive fungal infections are notoriously difficult to diagnose 
and treat, resulting in high mortality rates, even with state-of- 
the art treatments. The three most common pathogenic agents 
are Cryptococcus neoformans, Candida albicans, and Asper- 
gillus fumigatus (Mandell et al., 2010). These organisms are 
opportunistic fungi that prey on individuals with varying degrees 
of immune deficiency. Susceptible patient populations include 
premature infants, diabetics, individuals with liver disease, 
chemotherapy patients, organ transplant recipients, and those 
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infected with HIV (Mandell et al., 201 0). Compounding the clinical 
challenge is the slow pace of antifungal drug development: only 
a single new class of drugs (the echinocandins) has been 
approved for use in the United States in the last 30 years (Butts 
and Krysan, 2012; Mandell et al., 2010; Roemer et al., 2011). 

Fungal infections are estimated to cause 50% of deaths 
related to AIDS and have been termed a “neglected epidemic” 
(Armstrong-James et al., 2014). The fungus chiefly responsible 
for deaths in this population is C. neoformans (Armstrong-James 
et al., 2014). C. neoformans is an encapsulated basidiomycetous 
haploid yeast distantly related to Saccharomyces cerevisiae and 
Schizosaccharomyces pombe. A 2009 CDC study estimated 
that ~1 million infections and ~600,000 deaths annually are 
caused by C. neoformans, exceeding the estimated worldwide 
death toll from breast cancer (Lozano et al., 2012; Park et al., 
2009). C. neoformans is widespread in the environment and 
exposure occurs through inhalation of desiccated yeast or 
spores (Heitman et al., 2011). In immunocompromised patients, 
C. neoformans replicates and disseminates, causing meningo- 
encephalitis that is lethal without treatment (Heitman et al., 
2011). Induction therapy involves flucytosine and intravenous in- 
fusions of amphtotericin B (Loyse et al., 2013). Both drugs are 
highly toxic, difficult to administer, and neither is readily available 
in the areas with the highest rates of disease. The current recom- 
mendation for Cryptococcosis treatment is at least a year of ther- 
apy, which is difficult to accomplish in resource-limited settings 
(WHO, 2011). Thus, as is the case with infections caused by 
other fungal pathogens, effective treatment of cryptococcal in- 
fections is limited by the efficacy, toxicity, and availability of cur- 
rent pharmaceuticals. 

We implemented chemogenomic profiling to approach the 
challenges of therapeutic development in C. neoformans. This 
method involves the systematic measurement of the impact of 
compounds on the growth of defined null mutants to produce 
a chemical-genetic map. Such a map represents a quantitative 
description composed of numerical scores indicative of the 
growth behavior of each knockout mutant under each chemical 
condition. Cluster analysis of the growth scores for large 
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numbers of mutants under many chemical conditions can reveal 
genes that function in the same pathway and even those whose 
products are part of the same protein complex (Collins et al., 
2007; Parsons et al., 2004; Parsons et al., 2006). In addition, 
the identity of genes whose mutation produce resistance or 
sensitivity is useful for uncovering compound mode of action 
(MOA) (Hillenmeyer et al., 2008; Jiang et al., 2008; Nichols 
et al., 2011; Parsons et al., 2006; Xu et al., 2007; Xu et al., 
2009). Large-scale studies have been restricted to model organ- 
isms for which gene deletion collections have been constructed, 
namely S. cerevisiae, S. pombe, and Escherichia coii K1 2 (Hillen- 
meyer et al., 2008; Nichols et al., 2011; Parsons et al., 2006). 
However, as none of these are pathogens, the extent to which 
the resulting insights translate to pathogenic organisms is un- 
known. A variation on chemogenomic profiling, chemically- 
induced haploinsufficiency, was first developed using a diploid 
heterozygote gene deletion library S. cerevisiae to identify com- 
pound MOA. This method, which identifies genes that impact 
compound sensitivity based on a two-fold gene dosage change, 
is suited for diploid organisms and has been used in the path- 
ogen C. aibicans (Jiang et al., 2008; Xu et al., 2007; Xu et al., 
2009). 

We report here the generation of a large-scale chemogenomic 
map for C. neoformans using defined, commonly available 
knockout mutants, assessments of data quality, and extensive 
experimental verification. Comparisons of the C. neoformans 
profile with two large-scale published profiles from 
S. cerevisiae revealed that for most types of compounds, the 
chemical-genetic interactions are distinct even among ortholo- 
gous genes, emphasizing the importance of pathogen-focused 
investigation. We used nearest-neighbor analysis to predict 
new genes involved in polysaccharide capsule formation and 
infectivity, which we validated through experiment. We also uti- 
lized genetic responses to predict the G2/M phase of the cell 
cycle and the Cdc25 phosphatase as targets of a thiazolidone- 
2,4-dione derivative, which we confirmed in vivo and in vitro. 
Finally, because of the unmet need for improved antifungal 
drug efficacy, we developed a new algorithm, 02M, to predict 
new compound synergies based on the profiles of pairs known 
to be synergistic. Experimental tests demonstrate that the 
method performs vastly better than random expectation, thereby 
enabling the identification of synergistic compound combina- 
tions. Our studies establish a chemical-genetic foundation to 
approach the biology and treatments of C. neoformans infec- 
tions, which are responsible for more than one-third of HIV/ 
AIDS deaths worldwide. 

RESULTS 

A Chemical-Genetic Map of C. neoformans 

We assembled 1 ,448 C. neoformans gene deletion strains (Chun 
et al., 2011; Liu et al., 2008) (Table SI available online), corre- 
sponding to a substantial fraction of 6,967 predicted 
C. neoformans genes (Janbon et al., 2014), and a collection of 
compounds for screening (Table 1). Compounds were selected 
based on cost and literature evidence that they could inhibit 
the growth of fungi. Where feasible, compounds were chosen 
that are known to target specific biological processes. For 



each small molecule, we determined an approximate minimum 
inhibitory concentration (MIC) in agar, then measured growth 
of the knockout collection on each small molecule at 50%, 
25%, and 12.5% MIC using high density agar plate colony arrays 
and a robotic replicator. We then measured the size of each col- 
ony using flatbed scanning and colony measurement software 
(Dittmar et al., 2010). We performed a minimum of four replicate 
colony measurements for each mutant-condition pair. Plate- 
based assays are subject to known nonbiological effects, such 
as spatial patterns. To mitigate these errors, a series of correc- 
tive measures were implemented using approaches described 
previously, including manual filtration of noisy data, spatial effect 
normalization and machine learning-based batch correction 
(Baryshnikova et al., 201 0). In addition, the data for each deletion 
mutant and compound was centered and normalized. Each 
mutant-small molecule combination was assigned a score with 
positive scores representing relative resistance and negative 
scores representing compound sensitivity (Table S2). A global 
summary of the processed data organized by hierarchical clus- 
tering is shown in Figure 1 A. 

The importance and validity of the computational corrections 
is shown in Figures 1 B and SI . We estimated how reproducible 
the chemical-genetic profiles were by calculating the correlation 
scores for data obtained for different concentrations of the same 
small molecule (purple). This measures the degree of overlap 
between the overall chemical-genetic profiles, which are them- 
selves each composed of a score for each mutant-small mole- 
cule combination. We found significant correlation (p = 2.67 x 
10“^^®) between data obtained for different concentrations of 
the same small molecule compared to those between profiles 
generated by data set randomization, suggesting significant 
reproducibility. Moreover, correlation scores between chemi- 
cal-genetic profiles of different concentrations of different com- 
pounds (gray) are centered at approximately 0 (Figure 1 B). This 
difference in correlation scores is apparent even when 
comparing experiments performed on the same day, when 
spurious batch signal can contribute to false positives (Baryshni- 
kova et al., 2010). Our batch-correction algorithms resulted in 
same-batch screening data with strong positive correlation 
scores for the same compounds but correlation scores close 
to zero for different compounds (Figure SI), demonstrating suc- 
cessful removal of spurious signal (Baryshnikova et al., 2010). 
We compared chemical-genetic profiles between compounds 
in the azole family (Figure 1C). Despite the fact that the azoles 
tested include those of diverse uses, from agricultural pesticides 
to FDA-approved drugs (Table 1), many exhibit a significant pro- 
file correlation (p = 2.82 x 10“®), further indicating significant 
signal in the data. As a final assessment, we performed hyper- 
geometric testing across all compounds to determine whether 
the same sensitive gene knockouts (defined by Z < -2.5) are 
identified at different concentrations of the same compounds. 
Using a Bonferonni-corrected p value cutoff, nearly all com- 
pounds display significant overlap of responsive genes at 
different concentrations (Figure ID). 

We assigned at least one phenotype (sensitivity or resistance 
to a compound) to 1 ,1 98 of 1 ,448 mutants (Figure 1 E, Tables S2, 
S3, and S4). Of these, 855 exhibit one to ten phenotypes, while 
remaining 343 displayed from 11 to 146 phenotypes. Gene 
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Table 1. Small Molecules and Targets 



Inhibitor (Activator) 


Highest 

Screening Cone. 


Process/Enzyme 


Category 


Pubchem ID 


FDA Approval? 


1-10 phenanthroline 

hydrochloride 

monohydrate 


2 uM 


broad/transition metal 
complexes 


broad spectrum 


2723715 


no 


2-aminobenzothiazole 


30 uM 


cytoskeleton function/ 
kinesin Kipl 


cell structure 


8706 


no 


2-hydroxyethylhydrazine 


0.156% 


lipid synthesis/phospholipid 
methylation 


lipid biosynthesis 


8017 


no 


3-aminotriazole 


6.25 mM 


histidine synthesis/I MP 
dehydratase 


metabolism 


1639 


no 


4-hydroxytamoxifen 


1.56 uM 


estrogen receptor (mammals) 


signaling 


449459 


yes 


5-fluorocytosine 


2.5 ug/ml 


DN/VRNA biosynthesis 


DNA homeostasis/ 
protein synthesis 


3366 


yes 


5-methyltryptophan 


8 mM 


tryptophan synthesis 


metabolism 


150990 


no 


Abietic acid 


1 mM 


lipid synthesis/lipoxygenase 


lipid biosynthesis 


10569 


no 


Acifluorfen methyl 


156.25 ng/ml 


porphyrin synthesis/ 
protoporphyrinogen oxidase 


metabolism 


91642 


no 


(Aconitine) 


200 ug/ml 


membrane potential/Na"^ 
channels (mammals) 


membrane polarization 


245005 


no 


Aflatoxin B1 


100 ug/ml 


DNA damaging agent 


DNA homeostasis 


14403 


no 


Agelasine D 


5 ug/ml 


membrane potential/NaV 
K+-ATPase (mammals) 


membrane polarization 


46231918 


no 


Alamethicin/U-22324 


60 uM 


membrane integrity/forms a 
voltage-depended ion channel 


membrane polarization 


16132042 


no 


Alexidine dihydrochloride 


125 ug/ml 


antimicrobial/mitochondria 


mitochondria 


102678 


yes 


Allantoin 


100 ug/ml 


nitrogen-rich compound 


metabolism 


204 


topical 


Alternariol 


2.5 ug/ml 


cholinesterase inhibitor/sodium 
channel activator and DNA 
supercoiling/topoisomerase 1 


broad spectrum 


5359485 


no 


Alumininum sulfate 


1.5625 mM 


unknown 


unknown 


24850 


no 


(Amantadine 

hydrochloride) 


1.25 mM 


neurotransmitter release/ 
glutamate receptor 


signaling 


64150 


yes 


Amiodarone 


60 ug/ml 


membrane potential/NaV 
K+-ATPase (mammals) 


membrane polarization 


2157 


yes 


(Ammonium persulfate) 


50 mM 


reactive oxygen species 


apoptosis/stress response/ 
damage response 


62648 


no 


Amphotericin B 


1 ug/ml 


lipid biosynthesis/ergosterol 


membrane integrity 


5280965 


yes 


Andrastin A 


4 ug/ml 


protein modification/ 
farnesyltransferase 


protein trafficking 


6712564 


no 


Anisomycin 


50 uM 


translation/peptidyl transferase 


gene expression 


253602 


no 


Antimycin 


100 ug/ml 


respiration/cytochrome B 


metabolism 


14957 


no 


Apicidin 


312.5 ng/ml 


chromatin regulation/HDACs 


gene expression 


6918328 


no 


Artemisinin 


312.5 mM 


iron metabolism/hematin 
detoxification 


metabolism 


68827 


yes 


Ascomycin 


3.125 uM 


signaling/calcineurin 


signaling 


6437370 


yes 


Azide 


62.5 uM 


respiration/cytochrome 
C oxidase 


metabolism 


33558 


no 


Barium chloride 


16 mM 


metal homeostasis/diverse 


broad spectrum/unknown 


25204 


no 


Bafilomycin 


4 ug/ml 


autophagy/vacuolar-type 

H^-ATPase 


protein turnover 


6436223 


no 


Bathocuproine 
disulphonic acid (BCS) 


3 mM 


copper acquisition 


metabolism 


16211287 


no 



{Continued on next page) 
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Table 1. Continued 


Inhibitor (Activator) 


Highest 

Screening Cone. 


Process/Enzyme 


Category 


Pubchem ID 


FDA Approval? 


Bathophenanthroline 
disulfonate (BPS) 


300 uM 


iron acquistion/Fet3-Ftr1 


metabolism 


65368 


no 


Benomyl 


100 ug/ml 


cytoskeleton function/tubulin 


cell structure 


28780 


no 


(Betulinic acid) 


64 ug/ml 


protein degradation/proteasome 


protein turnover 


64971 


no 


Bifonazole 


50 ug/ml 


lipid biosynthesis/HMG-CoA 
and ergosterol biosynthesis 


membrane integrity 


2378 


no 


Brefeldin A 


40 ug/ml 


ER-Golgi Transport/ARF GEF 


secretion 


5287620 


no 


Calcium chloride 


16 mM 


metal homeostasis/diverse 


broad spectrum 


5284359 


no 


Caffeine 


2.5 mM 


DNA damage checkpoint/ATM 


DNA homeostasis 


2519 


no 


Calcium ionophore 
A23187 


2.5 ug/ml 


membrane integrity/peptide that 
acts as ionophore 


membrane integrity 


40486 


no 


Calcofluor white 


500 ug/ml 


cell wall synthesis/chitin and 
cellulose 


cell wall 


6108780 


no 


Camptothecin 


500 ug/ml 


DNAsupercoiling/topoisomerase 1 


DNA homeostasis 


24360 


analog 


Castanospermine 


2.4 mM 


protein modification/glycosidation 


protein modification 


54445 


derivative 


Cadmium chloride 


1 mM 


metal homeostasis/diverse 


broad spectrum/unknown 


24947 


no 


Cerulenin 


312.5 ng/ml 


fatty acid synthesis/beta- 
ketoacyl-acyl carrier protein 
synthase 


lipid biosynthesis 


5282054 


no 


Cesium chloride 


128 mM 


metal homeostasis/diverse 


broad spectrum/unknown 


24293 


no 


Chlorpromazine 

hydrochloride 


1 .5625 uM 


phenothiazine antipsychotic 
drug (mammals)/dopamine, 
seratonin, and other 
neuroreceptors 


signaling 


6240 


yes 


Chromium (III) chloride 


8 mM 


metal homeostasis/diverse 


broad spectrum 


16211596 


no 


Ciclopirox olamine 


750 ng/ml 


iron acquisition and other 


metabolism 


38911 


yes 


Cisplatin 


100 ug/ml 


DNA synthesis 


DNA homeostasis 


157432 


yes 


Climbazole 


0.03125% 


lipid biosynthesis/ergosterol 
biosynthesis and respiration/ 
cytochrome P450 


broad spectrum 


37907 


topical 


Clotrimazole 


500 nM 


lipid biosynthesis/ergosterol 
biosynthesis 


membrane integrity 


2812 


yes 


Colistin 


1 mg/ml 


membrane integrity 


membrane integrity 


5311054 


yes 


Congo red 


0.0625% 


cell wall synthesis/chitin, 
cellulose, and glucan 


cell wall 


11313 


no 


Coniine 


0.15625% 


neurosignaling (mammals)/ 
nicotinic receptor 


signaling 


441072 


no 


(Crystal violet) 


0.0012500% 


oxidative stress inducer 


stress response 


11057 


topical 


CuCI2 


8 mM 


copper homeostasis/diverse 


metabolism 


24014 


no 


Cycloheximide 


1 .875 ug/ml 


translation/ribosome 


gene expression 


6197 


no 


Cyclopiazonic acid 


15.625 uM 


ion transport and cell polarization 
(mammals)/Ca^^- ATPase 


metabolism 


54682463 


no 


Cyclosporin 


75 ug/ml 


signaling/calcineurin 


signaling 


5284373 


yes 


Cyproconazole 


1.5625 ug/ml 


lipid biosynthesis/ergosterol 
biosynthesis 


membrane integrity 


86132 


no 


Cyprodinil 


10 ug/ml 


methionine biosynthesis 


metabolism 


86367 


no 


Daphnetin 


100 uM 


signaling/PKA, PKC, EGR 
receptor, others 


signaling 


5280569 


no 


Desipramine 

hydrochloride 


250 uM 


neurosignaling (mammals)/ 
norepinephrine transporter 


signaling 


65327 


yes 


{Continued on next page) 
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Table 1. Continued 

Highest 



Inhibitor (Activator) 


Screening Cone. 


Process/Enzyme 


Category 


Pubchem ID 


FDA Approval? 


Dyclonine hydrochloride 


3.125 uM 


lipid biosynthesis/ergosterol 
biosynthesis 


membrane integrity 


68304 


yes 


Emetine dihydrochloride 
hydrate 


5 mM 


translation/ribosome 


gene expression 


3068143 


yes 


Emodin 


62.5 uM 


signaling/CK2, others 


signaling 


3220 


yes 


Erlotinib 


50 ug/ml 


signaling (mammals)/EGFR 
tyrosine kinase 


signaling 


176870 


yes 


FeCIS 


32 mM 


iron acquisition, metal 
homeostasis 


metabolism 


24380 


no 


Fenoxanil 


80 ug/ml 


melanin biosynthesis 


metabolism 


11262655 


no 


Fenpropimorph 


2.5 ug/ml 


sterol synthesis 


lipid biosynthesis 


93365 


no 


FK506 


312.5 ng/ml 


signaling/calcineurin 


signaling 


445643 


yes 


Fluconazole 


10 ug/ml 


lipid biosynthesis/ergosterol 
biosynthesis 


membrane integrity 


3365 


yes 


Fluspirilene 


25 uM 


antipsychotic drug, mechanism 
of action unknown 


unknown 


3396 


yes 


Gallium (III) nitrate 


25 mM 


metal homeostasis/diverse 


broad spectrum 


57352728 


no 


Geldanamycin 


2 uM 


protein folding/Hsp90 


protein folding 


5288382 


trials 


(H202) 


6 mM 


reactive oxygen species 


apoptosis/ stress 

response/damage 

response 


784 


topical 


Haloperidol 


125 uM 


phenothiazine antipsychotic drug 
(mammals)/dopamine, seratonin, 
and other neuroreceptors 


signaling 


3559 


yes 


Harmine hydrochloride 


1 mM 


cell differentiation (mammals)/ 
PPARgamma 


signaling 


5359389 


yes 


Hydroxyurea 


12.5 mM 


DNA replication/replication fork 
progression 


DNA homeostasis 


3657 


yes 


Hygromycin 


37.5 ug/ml 


translation/ribosome 


gene expression 


35766 


no 


Imazalil 


25 ug/ml 


lipid biosynthesis/ergosterol 
synthesis 


membrane integrity 


37175 


no 


lodoacetate 


500 uM 


protein degradation/cysteine 
peptidases 


protein turnover 


5240 


no 


Itraconazole 


1 .5625 ug/ml 


lipid biosynthesis/ergosterol 
synthesis 


membrane integrity 


55283 


yes 


K252a 


10 ug/ml 


signaling/variety of kinases 


signaling 


127357 


trials 


Latrunculin 


25 uM 


cytoskeleton function/actin 


cell structure 


445420 


no 


Lead (II) nitrate 


64 mM 


metal homeostasis/diverse 


broad spectrum 


24924 


no 


Leptomycin 


1 .25 ug/ml 


nucleocytoplasmic transport/ 
Crmi 


gene expression 


6917907 


no 


LiCI 


37.5 mM 


metal homeostasis/diverse 


broad spectrum/ 
unknown 


433294 


no 


Lovastatin 


37.5 ug/ml 


sterol synthesis/HMG CoA 
reductase 


metabolism 


53232 


yes 


LY 294002 


375 uM 


signaling/PI3K 


signaling 


3973 


no 


Magnesium chloride 


150 mM 


metal homeostasis/diverse 


broad spectrum 


21225507 


no 


Malachite green 


3.125 ug/ml 


antimicrobial/unknown 


antimicrobial 


11294 


no 


Manganese sulfate 


128 mM 


metal homeostasis/diverse 


metabolism 


177577 


no 


Mastoparan 


5uM 


signaling/G-proteins 


signaling 


5464497 


no 


(Menadione) 


150 uM 


vitamin K3/reactive oxygen 
species 


diverse 


4055 


yes 



{Continued on next page) 
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Table 1. Continued 

Highest 



Inhibitor (Activator) 


Screening Cone. 


Process/Enzyme 


Category 


Pubchem ID 


FDA Approval? 


Menthol 


1 mM 


voltage-dependent ion channels 
(mammals)/sodium channel 


signaling 


16666 


yes 


Methotrexate 


2.5 uM 


folate synthesis/DHFR 


metabolism 


126941 


yes 


Methyl methanesulfonate 
(MMS) 


0.0165% 


DNA replication/replication fork 
progression 


DNA homeostasis 


4156 


no 


Methylbenzethonium 
chloride (MBT) 


0.25% 


antimicrobial 


antimicrobial 


5702238 


topical 


MG132 


12.5 uM 


protein degradation/proteasome 


protein turnover 


462382 


no 


Miconazole 


6.25 ug/ml 


lipid biosynthesis/ergosterol 
synthesis 


membrane integrity 


4189 


yes 


Mitomycin C 


12 uM 


DNA damaging agent 


DNA homeostasis 


5746 


yes 


Myclobutanil 


2 ug/ml 


lipid biosynthesis/ergosterol 
synthesis 


membrane integrity 


6336 


no 


Mycophenolic acid 


2.5 ug/ml 


GMP synthesis/I MP 
dehydrogenase 


metabolism 


446541 


yes 


Myriocin 


12.5 ug/ml 


sphingolipid synthesis 


metabolism 


6438394 


analog 


NA8 




unknown 


unknown 




no 


(NaCI) 


37.5 mM 


osmotic regulation/HOG pathway 


stress response 


5234 


yes 


(NaN02) 


150 uM 


reactive nitrogen species 


stress response 


23668193 


no 


Neomycin sulfate 


2.4 mM 


protein synthesis/ribosome 


gene expression 


8378 


yes 


Nicotinamide 


25 uM 


chromatin regulation/sirtuins 


gene expression 


936 


yes 


Nigericin 


100 ug/ml 


membrane integrity/ion gradient 


membrane polarization 


34230 


no 


Nikkomycin 


5 ug/ml 


chitin synthesis 


cell wall 


72479 


trials 


NiS04 


1 mM 


antifungal/diverse 


antifungal 


5284429 


no 


Nocodazole 


30 uM 


cytoskeleton function/tubulin 


cell structure 


4122 


no 


Ophiobolin A 


62.5 ng/ml 


signaling/calmodulin 


signaling 


5281387 


no 


Parthenolide 


150 uM 


immune and inflammatory 
response/NF-kB 


signaling 


6473881 


no 


Pentamidine isethionate 


500 uM 


antimicrobial/mitochondrial 

function 


antimicrobial 


8813 


yes 


pH 


8.0, 8.5, 9.0 


pH homeostasis 


diverse 




no 


Phenylarsine oxide 


2.5 uM 


broad/XCXXCX protein 
phosphatases 


broad spectrum 


4778 


no 


Picoxystrobin 


6.25 ug/ml 


quinone outside inhibitor 
class/fungal cytochrome bcl 


mitochondria 


11285653 


no 


(Plumbagin) 


2.8 uM 


reactive oxygen species 


stress response 


10205 


no 


PMSF 


10 mM 


vacuolar proteolysis/ 
proteinase B 


signaling 


4784 


no 


Polyoxin B 


200 ug/ml 


chitin synthesis 


cell wall 


3084093 


no 


Povidone iodine 


2% 


antimicrobial 


antimicrobial 


410087 


topical 


Prussian blue 


75 mM 


monocation chelator 


metabolism 


16211064 


yes 


Quinic acid 


2 mM 


antimicrobial 


antimicrobial 


6508 


no 


Rapamycin 


0.125 uM 


signaling/TOR kinases 


signaling 


5284616 


yes 


Rubidium chloride 


150 mM 


potassium metabolism/ 
competitor 


metabolism 


62683 


no 


Rifamycin SV 
monosodium salt 


200 ug/ml 


RNA synthesis/RNA 
polymerase 


gene expression 


6324616 


yes 


S10 




unknown 


unknown 




no 


S8 




unknown 


unknown 




no 



{Continued on next page) 
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Table 1. Continued 

Highest 



Inhibitor (Activator) 


Screening Cone. 


Process/Enzyme 


Category 


Pubchem ID 


FDA Approval? 


S-aminoethyl-L-cysteine 

(thialysine) 


10 uM 


amino acid metabolism/lysine 
analog 


metabolism 


20048 


no 


SDS 


0.0015625% 


cell membrane integrity 


membrane integrity 


3423265 


no 


Selumetinib 


150 ug/ml 


signaling/MAPK (ERK) 


signaling 


10127622 


trials 


Sertraline 


15 ug/ml 


neurosignaling (mammals)/ 
seratonin reuptake 


neurosignaling 


68617 


yes 


Sodium azide 


62.5 uM 


respiration/cytochrome oxidase 


mitochondria 


33557 


no 


Sodium borate 


lOmM 


antimicrobial/diverse 


antimicrobial 


21749317 


no 


Sodium hydrosulfite 


6.25 mM 


antimicrobial, counteracts 
some antimicrobials 


antimicrobial 


24489 


no 


Sodium iodide 


75 mM 


antimicrobial 


antimicrobial 


5238 


yes 


Sodium metavanadate 


lOmM 


signaling/protein phosphotyrosine 
phosphatases 


signaling 


4148882 


no 


(Sodium molybdate) 


64 mM 


respiration/oxygen uptake 


diverse 


61424 


no 


Sodium selenite 


4 mM 


respiration/oxygen uptake 


diverse 


16210997 


yes 


Sodium sulfite 


100 mM 


ATP synthesis and accumulation/ 
unknown 


metabolism 


24437 


no 


Sodium tungstate 


64 mM 


metal homeostasis/diverse 


broad spectrum/unknown 


150191 


no 


Sorafenib 


100 uM 


signaling/VEGF tyrosine kinase 


signaling 


216239 


yes 


Staurosporine 


3uM 


signaling/PKCI 


signaling 


5279 


yes 


(STF-62247) 


400 uM 


autophagy 


protein turnover 


704473 


trials 


Sulfometuron methyl 


100 ug/ml 


branch chain amino acid 
synthesis/acetolactate synthase 


metabolism 


52997 


no 


Suloctidil 


400 uM 


Ca^^ homeostasis in blood 
vessels (mammals)/putative 
Ca^^ channel blocker 


vascular system/ 
metabolism 


5354 


formerly 


Tamoxifen citrate 


10 uM 


estrogen signaling (mammals)/ 
estrogen receptor, mixed 
agonist/antagonist 


signaling 


2733525 


yes 


Taurolidine 


0.01 % 


antimicrobial/lipopolysaccharide 
detection and signaling 


host defense 


29566 


yes 


Tautomycin 


250 nM 


signaling/PP2A 


signaling 


3034761 


no 


Tellurite 


0.1% 


sulfate assimilation 


metabolism 


115037 


no 


Terbinafine 


75 uM 


sterol synthesis/squalene 
epoxidase 


metabolism 


1549008 


yes 


Thiabendazole 


200 ug/ml 


respiration/NADH oxidase 


mitochondria 


5430 


yes 


Thonzonium bromide 


25 uM 


antimicrobial, pH homeostasis/ 
V-ATPase 


broad spectrum 


11102 


yes 


Tomatine 


5 ug/ml 


glycoalkaloid antifungal of 
unknown mechanism/ergosterol 
biosynthesis 


antifungal/membrane 

integrity 


28523 


no 


Trichostatin A 


100 uM 


chromatin regulation/HDACs 


gene expression 


444732 


no 


Trifluoperazine 


200 uM 


signaling/calmodulin 


signaling 


5566 


yes 


Trimethoprim 


1 .6 mg/ml 


folate synthesis/DHFR 


metabolism 


5578 


yes 


Tunicamycin 


2.5 ug/ml 


glycosylation/Alg7 


secretion 


11104835 


no 


Usnic acid 


25 ug/ml 


antimicrobial 


antimicrobial 


6433557 


trials 


Valinomycin 


20 uM 


membrane integrity/potassium 
exclusion 


membrane polarization 


5649 


no 


Verrucarin 


5 uM 


protein biosynthesis/polysome 


protein turnover 


6437060 


no 


ZnCI2 


4 mM 


metal homeostasis/diverse 


diverse 


5727 


no 



A list of compounds used in this study, their targets, and the screening concentration. 
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deletions with the greatest number of phenotypes are 
cnag_07622/!!^ (encoding the COP9 signalosome subunit 1) and 
cnag_05748/!!^ (encoding a Nto1 subunit of the NuA3 histone ace- 
tyltransferase). Compounds that elicit the greatest number of 
responsive gene deletions (Figure 1 F) are the heavy metal salt 
sodium tungstate and the trichothecene protein synthesis inhib- 
itor verrucarin (Table S5), presumably reflecting the pleiotropic 
impact of these molecules on cells. 

Gene Ontology Analysis Reveals Processes Associated 
with Drug Sensitivity 

Drug influx and efflux is thought to be a major general determi- 
nant of microbial drug susceptibility (Fernandez and Hancock, 
2012), but we also sought functions involved in drug sensitivity. 
We investigated this question in an unbiased fashion by 
analyzing chemogenomic profiles using Gene Ontology (GO), a 
gene annotation approach useful for comparative analyses. We 
first identified annotated orthologs of C. neoformans genes rep- 
resented in the deletion library and associated GO terms with 
these orthologs. We then determined whether the sensitive 
gene knockouts that respond to each small molecule are en- 
riched for association with particular GO terms relative to a ran- 
domized control set (Figure 2, Table S6). We observed that pro- 
tein transport-related terms are highly enriched, as are 
processes related to ubiquitin modification/proteolysis and 
vesicle-mediated transport. These terms are associated with 
nine and five compounds, respectively, suggesting that intracel- 
lular transport and ubiquitin-mediated protein turnover may play 
important general roles in drug sensitivity. 

Comparison with S. cerevisiae Chemogenomic Profiling 
Data Sets 

Ohemogenomic profiling has been performed extensively in 
S. cerevisiae, allowing us to ask whether genetic responses to 
compounds were conserved. We performed a three-way com- 
parison with two large-scale studies (Hillenmeyer et al., 2008; 
Parsons et al., 2006) (Figure 3A). Our data set has 46 compounds 
in common with Parsons et al. (2006) and 29 with Hillenmeyer 
et al. (2008); the two S. cerevisiae data sets had 15 compounds 
in common. First we identified genes whose knockouts exhibited 
a significant (Z < -2.5 or > +2.5) score (“responding”) when 
treated with a small molecule used in more than one data set, 
then identified which of those genes had orthologs in both 
S. cerevisiae and C. neoformans. We then calculated how 
many orthologs responded in both data sets. To adjust for a 
greater starting number of common genes when comparing 
the S. cerevisiae data sets to each other and control for func- 
tional biases, we limited this comparison to genes that also 
have orthologs in the C. neoformans knockout collection. The 
blue labels for compounds in Figures 3B-3D indicate statistically 
significant similarities (p < 0.05) in drug responses. Nearly all of 
the compounds in common between the two S. cerevisiae 
studies display statistically significant overlap in the genes that 
produced sensitivity to a given compound, despite the very 
different experimental platforms that were used to assess drug 
sensitivity/resistance (13/15 cases; Figure 3B). In striking 
contrast, few compounds show significantly conserved genetic 
responses when comparing either S. cerevisiae data set with 



the C. neoformans data. For the two C. neoformans-S. cerevisiae 
comparisons, only two of 46 compounds (Figure 3C) and one 
of 29 compounds (Figure 3D) show conserved responses, 
respectively. 

The responses to azole compounds exhibit limited response 
conservation between species. Comparing our data set with 
Parsons et al., the responses to fluconazole (FLC) and clotrima- 
zole, the azoles in both data sets, do not show significant overlap 
(Figure 3C). Likewise, between our data set and Hillenmeyer 
et al. (2008), no gene orthologs respond to miconazole and clo- 
trimazole in both data sets (Figure 3D). In contrast, between the 
two S. cerevisiae data sets, the only shared azole, clotrimazole, 
shows a significantly similar response (Figure 3B). We compared 
published work that examined the transcriptome responses of 
S. cerevisiae (Kuo et al., 2010) and C. neoformans (Florio et al., 
2011) to FLC. We found that, while there was significant overlap 
in orthologous genes impacted in the two species, (p = 1 .6 x 
10“^), there were also considerable differences: 67% of the 
genes with an altered response in C. neoformans whose ortho- 
logs in S. cerevisiae did not exhibit significant change, (Table 
S7) (Kuo et al.,2010). 

Using Chemical-Genetic Signatures to Identify Capsule 
Biosynthesis Mutants 

Studies in S. cerevisiae have shown that that the phenotypic sig- 
natures of gene deletions for genes that act in the same process 
or protein complex tend to be similar (Collins et al., 2007; Cos- 
tanzo et al., 2010; Nichols et al., 201 1 ; Parsons et al., 2004; Par- 
sons et al., 2006). We reasoned that this property of could be 
used in a pathogen to identify candidates for new genes involved 
in virulence by simply testing gene deletions that displayed 
phenotypic profiles similar to those corresponding to known 
virulence factors. 

C. neoformans harbors an inducible polysaccharide capsule 
that is unusual among fungi (Del Poeta, 2004; Doering, 2009; 
Haynes et al., 201 1 ; Kumar et al., 201 1 ; O’Meara and Alspaugh, 
2012; O’Meara et al., 2010; Vecchiarelli et al., 2013). The 
principal polysaccharide component, glucuronyixylomannan 
(GXM), consists of a repeating glycan unit that has a-1 ,3-linked 
mannose backbone with side chains of p-linked glucuronic 
acid and xylose (Kozel et al., 2003). Capsule production is critical 
for virulence and the ability of C. neoformans to evade detection 
and destruction by the host immune system (Vecchiarelli et al., 
2013). 

To identify candidates for genes involved in capsule formation 
and/or attachment, we organized our data set using hierarchical 
clustering of growth phenotypes produced by compound expo- 
sure. We focused on two clusters, each containing a gene(s) pre- 
viously implicated in capsule biosynthesis: PBX1 and CPU (Liu 
et al., 2008; Liu et al., 2007b) in one cluster (Figure 4A) and 
CAP60 (Chang and Kwon-Chung, 1998) in a second cluster (Fig- 
ure 4B). The pbx1 /^/cpi1 cluster contains nine genes and the 
capGO^ cluster seven. We quantified capsule accumulation after 
induction by computing the ratio of the diameter of the cell and 
capsule to the diameter of the cell alone (Figures 4C and 4D). 
Wild-type cells exhibit high capsule production, pbx7A mutants 
display a partial defect (Liu et al., 2007a) and cp/7A and capGOU^ 
mutants are acapsular (Chang and Kwon-Chung, 1 998; Liu et al.. 
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Figure 2. Determinants of Compound 
Sensitivity 

We calculated whether molecules elicited a sig- 
nificant response from C. neoformans ORFs that 
are enriched for association with specific GO 
terms. Terms are listed on the y axis and the 
number of compounds whose responding gene 
knockouts associated with that GO term are listed 
on the X axis. See also Table S6. 
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2008). We found that seven of nine mutants in the pdx7A/cp/7A 
cluster exhibit a statistically significant capsule defect, as did 
four of the seven mutants in the cap60/^ cluster. In contrast, pre- 
vious work from our laboratory found that approximately 1 % of 
the original C. neoformans library shows a gross defect in 
capsule production (Liu et al., 2008). 

Previous work showed thatpbx7A mutants produce polysac- 
charide capsule whose attachment to the cell wall is sensitive to 
sonication, a finding that we confirmed (Figures 4C and 4D). We 
refer to the cell’s ability to retain GXM on the cell surface as 
“capsule maintenance.” Knockout mutants in cnag_01058 do 
not exhibit a basal capsule defect but lost nearly 40% of their 
capsule diameter following sonication. Cells deleted for the 
GCN5 gene, like pbx1^ cells, show both decreased capsule 
levels and sonication-sensitive capsule. None of the mutants 
from the capGO^ cluster produces a sonication-sensitive pheno- 
type, suggesting that the pbx1 ^/cpH ^ and capGO^ clusters 
organize mutants that have distinct phenotypes. However, 
because several mutants do not produce visible capsule, the 



sonication test is insufficient to defini- 
tively measure capsule maintenance. 
We therefore analyzed how much 
glucoronoxylomannan (GXM), the major 
capsular polysaccharide (Doering, 2009), 
is secreted into the growth medium by 
blotting with a-GXM antibodies (Fig- 
ure S2A). We found that two mutants 
that produce little (gcr?5A) or no (yap 7 A) 
visible capsule still shed GXM into the 
medium, suggesting that they cannot 
retain capsule on their cell surface. 
Indeed, we found that they shed more GXM than pbx7A cells. 
Four of nine mutants in thepbx7A/cp/7A cluster exhibit a main- 
tenance defect, whereas none of the capGO/^ cluster mutants do. 
We also found that GXM produced by these cells can be taken 
up and added to the surface (“donated”) of an acapsular mutant 
using a standard GXM transfer assay (Kozel and Hermerath, 
1984; Reese and Doering, 2003). Moreover, apparent capsule- 
defective mutants shed GXM (Figures S2B and S2C) and can 
donate GXM from conditioned medium (Figure S2C). Mutants 
that appear to not secrete GXM (pbx7A, cp/7A, and sgf73f!^) 
can donate it, but only if conditioned medium concentration is 
increased 10-fold (Figure S2D). These data are consistent with 
a recently published study on the role of Pbxl in capsule attach- 
ment and assembly (Kumar et al., 2014). 

Since the capsule is a major virulence trait of C. neoformans, we 
tested whether knockout mutants that exhibited a capsule defect 
displayed a defect in the mammalian host, using a murine inhala- 
tion model. We infected mice with a mixture of differentially- 
tagged wild-type and mutant cells at a ratio of 1:1. At 10 days 



Figure 1. Chemical-Genetic Profiling of C. neoformans 

(A) Heat map of full data set following hierarchical clustering. Compounds are arrayed on the x axis and gene knockouts on the y axis. See also Tables S1 and S2. 

(B) Probability density function for pairwise correlation scores between the chemical genetic profiles of different compounds (gray) and the same compounds at 
different concentrations (purple) screened on different days (different batches). Scores between the chemical-genetic profiles of different concentrations of the 
same compounds are significantly higher than those between different compounds (Wilcoxon test, p = 2.7 x 10“^^®). See also Figure S1 . 

(C) Probability density function for pairwise correlation scores between the chemical genetic profiles of different compounds (gray) and azole family compounds 
(purple). Pairwise comparisons between azoles exhibit higher correlation scores than nonazole compounds (Wilcoxon test, p = 2.8 x 10“®). Molecules with the 
highest pairwise comparisons scores are listed on the right. 

(D) Pearson’s correlation score between two different concentrations of the same compounds. Concentrations with similar correlation scores are binned together 
(y axis). For compounds with the greatest correlation scores between concentrations, Venn diagrams of significant genes (Z < -2.5) present in profiles from the 
same compounds at different concentrations and the small-molecule structure are shown. The orange line indicates a hypergeometric p value < 0.05 

(E) Histogram showing the number of deletion mutants that have given number of phenotypes. A phenotype is considered |Z| > 2.5 and we identified phenotypes 
independently for each small-molecule concentration. 

(F) Histogram showing the number of small molecules that have a given number of phenotypes. Phenotypes (|Z| > 2.5) were identified for each small-molecule 
condition/concentration 

See also Figure SI and Tables SI , S2, S3, S4, and S5. 
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Figure 3. Chemical-Genetic Signatures of C. neoformans Genes Differ from Orthoiogous S. cerevisiae Genes 

(A) Flowchart of computation process for comparing data sets. We identified C. neoformans and S. cerevisiae orthoiogous genes that were present in all data sets, 
then compared the responses of only those genes in all the data sets. We compared genes whose knockout mutants significantly (|Z| > 2.5) responded to 
compound that were common in at least two of the data sets. 

(B) Comparison between Parsons et al. (2006) and Hillenmeyer et al. (2008), comparing the response (|Z| > 2.5) of genes that have orthologs present in the 
C. neoformans data set. Compounds whose profiles exhibit significant overlaps (p < 0.05) are labeled in blue. 

(C) Comparison between our data set and Parsons et al. (2006) Compounds whose profiles exhibit significant overlaps (p < 0.05) are labeled in blue. 

(D) Comparison between our data set and Hillenmeyer et al. (2008) Compounds whose profiles exhibit significant overlaps (p < 0.05) are labeled in blue. 
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postinfection (dpi), we sacrificed animals, harvested and homog- 
enized lung tissue, then plated on the appropriate selective media 
for colony forming units (CPUs). All but one of the pbx7A/cp/7A 
cluster members were significantly underrepresented relative to 
wild-type; the exception was the cnag_01058^ mutant, which is 
defective in capsule maintenance but not capsule biosynthesis 
(Figures 4C and S2A). yap1^. cells, which appear acapsular but 
secrete GXM, displayed a major defect in fitness in the host (Fig- 
ure 4E). Three of four cap60A cluster mutants also display a defect 
in accumulation of CPUs in host lungs (Figure 4E). 

Chemogenomics Identifies the Cell Cycle as a Target 
of the Antifungal Small Molecule S8 

We included a number of drug-like antifungal compounds in our 
screen in order to identify their targets (Table 1). Our use of 
C. neoformans chemogenomics to assist in the identification of 
a target of toremifene is described elsewhere (Butts et al., 
201 4). Here we investigate the thiazolidine-2,4-dione derivatives 
originally described for their activity against C. albicans biofilms 
(Kagan et al., 2014). 

Our chemogenomic profiling data of the thiazolidine-2,4-dione 
derivative S8 revealed a striking outlier: a knockout mutant in the 
gene coding for a C. neoformans ortholog of the conserved cell- 
cycle kinase Weel, is relatively resistant (Figure 5A). We 
observed resistance at multiple concentrations of S8 (Table 
S2). The related compound NA8, which contains a replacement 
of a sulfur atom with a carbon atom on the thiazolidinedione moi- 
ety (Figure 5B), does not elicit the same resistance (Figure S3A). 
The wee 7 A mutant is also resistant to S10 (Figure S3B), which 
harbors a Cl 0 alkyl chain instead of C8 but is otherwise identical 
to S8 (Figure S3C). 

Weel regulates the G2/M cell-cycle checkpoint through inhib- 
itory phosphorylation of Cdkl , which in turn is required for cells 
to traverse the checkpoint. The essential phosphatase Cdc25 
activates Cdkl by removing the inhibitory phosphorylation 
added by Weel (Morgan, 2007) (Figure 5C). Because the 
wee 7 A is relatively resistant to S8, we hypothesized that S8 tar- 
geted a protein that acts through Weel to regulate Cdkl . One 
such target could be Cdc25. 

We reasoned that if the Weel /Cdc25-regulated step of the cell 
cycle were an important target of S8 in vivo, wild-type 
C. neoformans cells treated with S8 would arrest at G2/M. To 
test this prediction, we treated exponential cultures with S8, 
SI 0, or NA8 and examined the impact on the cell cycle. We har- 
vested and fixed representative samples every 30 min, then 
analyzed DMA content by flow cytometry. Control cultures 
treated with DMSO (carrier) (Figure 5D) or the control compound 
NA8 (Figure 5E) stayed asynchronous for the entire 3.5 hr of the 
time course. Strikingly, S8-treated (Figure 5F) cells accumulated 
with 2C DMA content, which indicates G2/M arrest in 
C. neoformans, a haploid yeast (Whelan and Kwon-Chung, 
1986). At later time points, cells synthesize DMA but do not 
complete mitosis and cytokinesis. This is consistent with obser- 
vations in S. pombe that partial inhibition of Cdkl permits re- 
replication of DMA (Broek et al., 1991). 

Because inhibition of Cdc25 would provide a parsimonious 
explanation for the genetic and biological properties of S8, we 
tested whether S8 inhibits C. neoformans Cdc25 in vitro. We ex- 



pressed and purified the catalytic domain of a C. neoformans or- 
tholog (CNAG_07942) in E. coll (Figure S3D) and then performed 
in vitro phosphatase assays using 3-0-methyl fluorescein phos- 
phate (OMFP) as a substrate (Figures 5G and 5H) (Hill et al., 1 968). 
We observed that S8 inhibits Cdc25 activity (K| ~140 |iM, Fig- 
ure 5E), as do both SI 0 (Figure S3E) and NSC 663284 (Ki~250 , 

Figure S3F), a commercially available inhibitor of mammalian 
Cdc25 (Pu et al., 2002). The control compound NA8 does not 
inhibit C. neoformans Cdc25 in vitro (Figure S3G). For S8, the 
in vitro inhibition constant is roughly comparable to the liquid 
MIC value against C. neoformans, which we measured to be 
~60 |iM in YNB. S10 has a higher Kj (K|~310 |iM) but similar to 
the MIC value (~55 |iM) measured in YNB agar compared to S8. 

02M: A Genetic Biomarker Algorithm to Predict 
Compound Synergies 

Drug resistance is a major clinical challenge in the treatment of 
both bacterial and fungal infections (Anderson, 2005; Cantas 
et al., 2013). An effective therapeutic strategy is to treat patients 
with drugs that act synergistically, enhancing each other’s effec- 
tiveness beyond that produced by the sum of each drug’s indi- 
vidual impact (Kalan and Wright, 2011). This approach is thought 
to decrease acquisition drug resistance, increase the available 
drug repertoire (Kalan and Wright, 2011) and ameliorate toxic- 
ities (Kathiravan et al., 2012; Lehar et al., 2009). 

We hypothesized that we could use the chemogenomic infor- 
mation from our screens of drugs known to act synergistically, 
such as FLC and fenpropimorph (Jansen et al., 2009), to identify 
new synergistic interactions (Figure 6A). When we compared the 
identity of genes whose knockouts “responded” to each individ- 
ual small molecule in a known synergistic pair (|Z| > 2.5, Tables 
S3 and S4), we found that this “responsive” gene set was signif- 
icantly enriched over the expected value (Fisher’s exact test, p < 
6 X 10“®) (Figure 6A, top). This observation is consistent with a 
previous report that the chemical-genetic response to each 
drug in a synergistic pair is enriched for overlapping genes (Jan- 
sen et al., 2009). 

This overlap in responsive gene sets led us to consider the 
possibility that overlapping responsive genes from known syner- 
gistic compound pairs could be used as biomarkers to predict 
new synergistic combinations. Our method involves first identi- 
fying the overlaps in responsive gene sets for all compounds 
that had been reported in the literature to synergize with a small 
molecule of interest (“compound X”), selecting those genes 
common to all of those sets (Figure 6A, middle, the overlaps of 
overlaps). We refer to these genes as “synergy biomarker 
genes.” Critically, we next hypothesized that any compound 
that contains one or more of these synergy biomarker genes in 
its responsive gene set would be synergistic with compound X. 
Because our method used the overlaps of response gene over- 
laps between compounds known to be synergistic, we refer to it 
as the “overlap-squared method” or “02M.” 

We then tested 02M using two drugs for which substantial 
literature synergy information was available: FLC and geldana- 
mycin (GdA). FLC is an approved antifungal drug. GdA is an in- 
hibitor of Hsp90, a chaperone protein with many physical and 
genetic interactions (Taipale et al., 2010). We performed our 
analysis on fenpropimorph and sertraline, which are known to 
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Figure 4. Chemical-Genetic Profiling Identifies Genes Involved in Capsule Biosynthesis 

(A) Cluster containing the chemical signatures of the pbx1/^ and cpl1/^ mutants. 

(B) Cluster containing the chemical signatures of the capdOI^ mutants. 
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act synergistically with FLC (Jansen et al., 2009; Zhai et al., 
2012), and cyclosporine and rapamycin, which are known to 
act synergistically with GdA (Francis et al., 2006; Kumar et al., 
2005). Using this prior knowledge and our data, we identified 
synergy biomarker genes for FLC (CNAG_00573, CNAG_ 
03664, and CNAG_03917) and GdA (CNAG_01172, CNAG_ 
03829, and CNAGJ01862). We generated a list of compounds 
from our chemical-genetics data set that contain one or more 
of these genes in their responsive genes set. 

We then used a standard “checkerboard” assay to experimen- 
tally determine fractional inhibitory concentration index (FICI), 
and we adopted the standard that an FICI value below 0.5 is syn- 
ergistic (Meletiadis et al., 201 0). We determined FICIs for FLC and 
GdA with three sets of compounds: (1 ) the compounds predicted 
from synergy biomarker genes, (2) the predicted synergistic com- 
pounds for the other drug (e.g., we tested compounds predicted 
to be synergistic with GdA for synergy with FLC), and (3) a 
randomly generated subset of the compounds not predicted to 
act synergistically with either FLC or GdA. The second and third 
groups are as controls for compounds that are generally syner- 
gistic and to determine the background frequency of synergistic 
interactions within a set of compounds. 

Respective experimental FICI values for FLC and GdA are 
shown in Figures 6B and 6C (yellow bars: synergy; blue bars ad- 
ditive or worse interactions). The labels for compounds we pre- 
dicted to be synergistic are colored purple, positive controls 
(published synergistic compound pairs) are colored green, and 
predicted negative control compounds are colored blue (Fig- 
ure 6). We observed that only ~1 0% of the negative control com- 
pounds act synergistically with either FLC or GdA. In striking 
contrast, we found ~80% and ~60% of the compounds 
selected by 02M are synergistic with FLC and GdA, respectively. 
Thus, for two unrelated compounds, 02M is highly successful at 
predicting synergistic interactions and performs vastly better 
than the brute force trial-and-error approach (Figures 6D and 
6E) (p < 0.0008, Fisher’s exact test). 

DISCUSSION 

We applied chemogenomic profiling to the major fungal driver of 
AIDS-related death, the encapsulated yeast C. neoformans, to 
produce a chemical-genetic atlas of this important pathogen. 
Beyond identifying new virulence factors and compound mode 
of action, we describe a conceptually general approach to iden- 
tifying drug synergies that combines prior knowledge and che- 
mogenomic profiles. 

A Chemical-Genetic Atlas for C. neoformans 

We maximized the quality of the atlas in several ways. To capture 
concentration-dependent impacts of compounds, we obtained 
the MIC for each compound and examined the genetic re- 



sponses at multiple concentrations below MIC. In addition, we 
performed a large number of control screens and incorporated 
batch information for systematic correction. Overall benchmarks 
of data quality (Figure 1) together with nearest neighbor and 
Gene Ontology analysis (Figure 2) support the existence of sub- 
stantial chemical-genetic signal in the data. Even genes with 
orthologs in both S. cerevisiae and C. neoformans show consid- 
erable differences in responses (Figure 3). While this may not be 
surprising given the large phylogenetic distance between these 
fungi, it shows that understanding the chemical responses of 
pathogens requires pathogen-focused studies, even when 
considering conserved genes and processes. For example, we 
observed differences in the responses to azole drugs between 
S. cerevisiae and C. neoformans (Figure 3). Since azoles are 
heavily used clinically, differences in responses between species 
are of significant interest. 

Insights Gained from Initial Use of the C. neoformans 
Chemical-Genetic Atlas 

Identification of Mutants that Impact Capsule Formation 
and Mammaiian Infection 

Our studies on capsule biosynthesis genes focused two different 
clusters that contained genes that we and others have shown to 
be required for capsule formation, the pbx1/cpi1A cluster and 
the cap60A cluster. As anticipated from model organism studies 
(Collins et al., 2007; Costanzo et al., 2010; Nichols et al., 2011; 
Parsons et al., 2004; Parsons et al., 2006), these clusters were 
indeed enriched for genes whose mutants are defective in 
capsule biosynthesis and mammalian pathogenesis. The genes 
represented by the two clusters differed functionally in that 
genes in the pbxVcpH A cluster but not the cap60A cluster are 
required for association of capsule polysaccharide with the cell 
surface (Figures 4 and S2). A recent study on Pbxl and its ortho- 
log, Pbx2, proposes that the two proteins act redundantly in 
capsule assembly (Kumar et al., 2014). pbx7A and pbx2A cells 
shed lower amounts of GXM into the culture medium but that 
the GXM functions in a capsule transfer assay. Electron micro- 
scopy studies indicate that these mutants exhibit defects in the 
cell wall. Our data are fully consistent with these data. Other 
genes from the pbx1 A/cpi1 A cluster likely play a role in these 
processes. Some, like GCN5 and SGF73, which encode ortho- 
logs of the yeast SAGA histone acetylase/deubiquitylase com- 
plex, are clearly regulatory, while others could act more directly. 
While detailed validation and investigation of these many candi- 
dates (including gene deletion reconstruction studies) will be 
required to obtain mechanistic insight into capsule biology, their 
enrichment suggests value of this Cryptococcal chemogenomic 
resource in identifying mutants defective in virulence. 
Compound Target Identification 

Chemogenomic profiling has proven useful in identifying targets 
of uncharacterized compounds (Parsons et al., 2006), including 



(C) Images of individual cells grown in 1 0% Sabouraud’s broth to induce capsule. Representative cells are shown for mutants that exhibit a statistically significant 
phenotype. Scale bar, 5 ^irn. 

(D) Quantification of capsule sizes from all mutants in pbx7A/cp/7A (purple labels) cluster or capGO^. (green labels) cluster. 100 cells were measured for each 
strain, the error bars represents the standard deviation, and p values were calculated using Student’s t test. 

(E) Colony counts from colony forming units (cfu) extracted from mouse lungs following an inhalation infection. Three mice are shown for each datapoint; the error 
bars represent the standard deviation and p values were calculated using Student’s t test. 
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in the pathogenic fungus C. albicans (Jiang et al., 2008; Xu et al., 
2007; Xu et al., 2009). Chemical-genetic data can be used to 
determine the target of compounds within complex mixtures 
(Jiang et al., 2008; Xu et al., 2009). Our goal differed: we sought 
to identify targets of repurposed compounds, as described else- 
where (Butts et al., 201 3), or, in the case of S8, a compound iden- 
tified as an inhibitor of Candida biofilms (Kagan et al., 2014). The 
identification of the Weel kinase as a sensitivity determinant for 
S8, the cell-cycle arrest produced by S8, and the ability of the 
compound to inhibit Cr?Cdc25 in vitro together support the 
model that S8 inhibits growth through via the cell cycle at least 
in part via inhibition of Cdc25. Whether this explains its impact 
on biofilms requires further investigation. As with any compound 
target, ultimate proof that Cdc25 is the target of S8 will require 
the isolation of resistance alleles of CDC25. 

Given the simplicity of the pharmacophore and its K| for 
CnCdc25, it would not be surprising if S8 had additional cellular 
targets, as recently described (Feldman et al., 2014). Cdc25 is a 
conserved cell-cycle phosphatase and therefore might be 
considered a poor drug target a priori but cyclin-dependent ki- 
nases are a focus of recent antiparasite therapeutics (Geyer 
et al., 2005). It is also notable that the target of azole antifungals, 
lanosterol 14-demethylase (Ghannoum and Rice, 1999) is 
conserved from yeast to human. 

02M: Predicting Compound Synergies Using Prior 
Knowiedge and Chemicai Profiies 

Identifying synergistic drug interactions is of considerable clin- 
ical interest, but efficient methods for their identification are 
elusive. Systematic examination of combinations of a small set 
of compounds using S. cerevisiae suggests that synergies are 
relatively rare and often involve so-called “promiscuous” syner- 
gizers, compounds that are synergistic with multiple partners 
(Cokol et al., 2011). Chemogenomic studies have shown that 
drugs known to be synergistic tend to have overlapping “re- 
sponding” gene sets (Jansen et al., 2009). We expanded on 
this concept to develop a highly parallel method, 02M, for effi- 
ciently predicting synergistic drug interactions. Our work utilizes 
prior knowledge of drug synergies to identify a discrete set of 
predictive biomarker genes for a given compound. We experi- 
mentally demonstrated the utility of 02M for two compounds, 
FLC and geldanamycin. Our method identified dozens of syner- 
gistic interactions and discovered a small number of biomarkers 
that could serve as readouts for further screens for synergistic 
drugs. The method appears to not simply select promiscuous 
synergiziers: five of six drugs previously classified as promiscu- 



ous synergizers (Cokol et al., 201 1) were tested in our studies but 
most were not predicted to be synergistic by 02M. One of the 
promiscuous compounds was a positive control (fenpropimorph 
with FLC) and another (dyclonine) was predicted synergistic with 
FLC but was not and was predicted not synergistic with GdA but 
was. We anticipate that 02M could be used to identify synergis- 
tic compound interactions in published E. coli and C. albicans 
chemical-genetics data sets (Jiang et al., 2008; Nichols et al., 
201 1 ; Xu et al., 2007; Xu et al., 2009). 

EXPERIMENTAL PROCEDURES 
Determination of MICs 

We determined MIC on solid growth medium for each compound used in 
screening (Table 1). 

Coiony Array-Based Chemogenomic Profiiing 

C. neoformans knockouts were inoculated from frozen 384-well plates to 
YNB + 2% glucose. Plates were grown 24 hr at 30°C, then used to inoculate 
screening plates containing compounds of interest. 

Data Anaiysis 

Data were analyzed as previously described (Baryshnikova et al., 2010) with 
the a few exceptions. 

C. neoformans Orthoiog Identification and GO Term Mapping 

Mapping from S. cerevisiae Uniprot Proteins to C. neoformans Uniprot 
Proteins was done using One-to-one mappings in MetaPhOrs (http:// 
metaphors.phylomedb.org/). C. neoformans ORFs were compared to a data- 
base of S. cerevisiae Uniprot Proteins using blastp (Altschul et al., 1 997) with a 
E-score cutoff of 10“^°. Corresponding yeast GO annotations were mapped 
onto the C. neoformans ORFs. 

Comparison of Transcriptionai Response to FLC 

Compared transcriptional responses between S. cerevisiae (Kuo et al., 2010) 
and C. neoformans (Florio et al., 2011). 

Capsule Induction Assay 

Samples were grown overnight at 30°C in 100% Sabouraud’s broth, then 
diluted 1:100 into 10% Sabouraud’s broth buffered with 50 mM HEPES pH 
7.3 and grown for 3 days at 37°C. India ink was added at 3:1 ratio and samples 
imaged on a Zeiss /\xiovert microscope. 

Capsule Transfer Assay 

Performed as in (Reese and Doering, 2003), with minor modifications. 

GXM Immunoblot Assay 

Conditioned medium was made from donor GXM donor strains as described 
above. 



Figure 5. C. neoformans Cdc25 Is a Target of S8 In Vivo and In Vitro 

(A) Chemical-genetic data of the growth scores of each knockout mutant grown on S8 (y axis). The mutant that exhibited the greatest resistance is wee7 A. The 
mutant strain that showed the greatest sensitivity to S8 is cnag_04462b^. 

(B) Structures of S8, NA8, and NSC 663284. The structure of S10 is shown in Figure S3C. 

(C) G2/M regulation (Morgan, 2007). 

(D) DNA content of asynchronous C. neoformans culture split into aliquots for treatment with compounds of interest, with samples harvested at appropriate times. 
Data for DMSO-treated culture is shown. 

(E) DNA content from NA8-treated culture from same starting culture as Figure 5F. 

(F) DNA content from S8-treated culture from same starting culture as Figure 5F. 

(G) Phosphatase activity of purified C. neoformans Cdc25 catalytic domain (CNAG_01572, aa442-662). Average of three independent replicates are shown and 
the error bars represent the standard deviation. 

(H) Michaelis-Menten kinetics of S8 inhibition of CnCdc25 from in vitro phosphatase activity. A noncompetitive model of enzyme inhibition produced the best R^ 
value (0.94). 
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Mouse Infection Assay 

Mouse lung infections were performed as previously described (Chun et al., 
2011 ). 

Cdc25 Protein Purification 

We identified the C. neoformans ortholog of Cdc25, CNAG_01572, by best 
reciprocal BLAST (Altschul et al., 1997) hit with the human Cdc25A, 
Cdc25B, and Cdc25C protein isoforms. We then inserted the exonic 
sequence of the catalytic domain into a 6x-His tag expression vector for 
purification. 

Cdc25 Phosphatase Assay 

Cdc25 phosphatase activity was analyzed in activity buffer (50 mM Tris pH 8.3, 
5% glycerol, 0.8 mM dithiolthreitol, and 1% PVA). 

Cdc25 Inhibitor Treatment and FACS Anaiysis 

Wild-type C. neoformans was grown overnight in 1 x YNB at 30°C with 
rotation. Cultures were diluted to ODeoo ^0.2 into 150 ml lx YNB, then 
incubated 3 hr at 30°C. Samples were then split and NA8, S8, and S10 
added to 60 ^iM. Equivalent volume of DMSO was added to the control 
culture. 

Fractionai Inhibitory Concentration Index Assay for Synergy 

We determined FICI using a standard checkerboard assay (Hsieh et al., 1993), 
calculating FICI as described using a 50% growth inhibition cutoff for MICs for 
individual compounds (Hsieh et al., 1993; Meletiadis et al., 2010), then using a 
standard cutoff of FICI < 0.5 to define synergy. 

See Extended Experimental Procedures for additional details. 

SUPPLEMENTAL INFORMATION 
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SUMMARY 

Glutamine is the primary metabolite of nitrogen 
assimilation from inorganic nitrogen sources in mi- 
croorganisms and plants. The ability to monitor 
cellular nitrogen status is pivotal for maintaining 
metabolic homeostasis and sustaining growth. The 
present study identifies a glutamine-sensing mecha- 
nism common in the entire plant kingdom except 
Brassicaceae. The plastid-localized Pn signaling pro- 
tein controls, in a glutamine-dependent manner, the 
key enzyme of the ornithine synthesis pathway, N- 
acetyl-L-glutamate kinase (NAGK), that leads to argi- 
nine and polyamine formation. Crystal structures 
reveal that the plant-specific C-terminal extension 
of P||, which we term the Q loop, forms a low-affinity 
glutamine-binding site. Glutamine binding alters Pn 
conformation, promoting interaction and activation 
of NAGK. The binding motif is highly conserved in 
plants except Brassicaceae. A functional Q loop re- 
stores glutamine sensing in a recombinant Arab/dop- 
s/s thaliana Pn protein, demonstrating the modular 
concept of the glutamine-sensing mechanism adop- 
ted by P|| proteins during the evolution of plant 
chloroplasts. 

INTRODUCTION 

Glutamine is the primary product of nitrogen assimilation from 
inorganic nitrogen sources and a central metabolite in nitrogen 
metabolism in plants. The regulation of nitrogen metabolism 
and the maintenance of a proper carbon-nitrogen balance in 
plants are extremely complex and interwoven at multiple levels. 
How plants sense nitrogen status is an intriguing issue of current 
research (e.g., Nunes-Nesi et al., 2010). Glutamine is an impor- 
tant nitrogen status reporter in many bacteria (Forchhammer, 
2007) but, in plants, glutamine sensing is poorly understood. In 
prokaryotes, Pn signaling proteins (in the following termed Pn 
proteins) have been identified as central signal-integrating mole- 
cules coordinating nitrogen assimilatory reactions in response to 
the general metabolic state of cells. The Pn protein family repre- 
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sents one of the most abundant families of signaling proteins in 
nature, occurring in bacteria, Archaea, and plants (Chellamuthu 
et al., 2013; Leigh and Dodsworth, 2007; Sant’Anna et al., 
2009). P|| proteins are able to sense and integrate signals from 
central metabolism, in particular 2-oxoglutarate (2-OG), an indi- 
cator of C-N balance, as well as the energy status via competi- 
tive ADP/ATP binding (Fokina et al., 2010; Jiang and Ninfa, 
2009b; Zeth et al.,2014). 

The concentration of these metabolite effectors is sensed by 
binding to an ensemble of intercommunicating sites in the 
trimeric Pn proteins, where the actual effector molecule binding 
status is translated into distinct Pn conformations. The T loop, 
a highly flexible loop segment protruding from each subunit, 
plays a key role in ligand binding and receptor interactions (For- 
chhammer, 2008; Huergo et al., 201 3). Depending on the confor- 
mation of the T loop, P|| proteins can bind to various receptors, 
thereby exerting control at all levels of metabolic regulation 
(transport activity, metabolic reactions, gene expression). This 
basic mode of Pn function, based on direct sensing of effector 
molecules, is conserved in almost all Pn proteins. 

A second, phylogenetically diverse regulatory layer may be su- 
perposed on P|| signaling: covalent modification of apical resi- 
dues of the T loop allows the integration of additional signals. 
In proteobacteria, Pn proteins can be covalently modified by ur- 
idylylation at T loop residue Tyr51 . The modifying enzyme GInD, 
a bifunctional uridylyltransferase/uridylyl-removing enzyme (EC 
2.7.7.59), responds to the cellular glutamine level such that low 
glutamine levels trigger the Pn-uridylylation reaction (Adler 
et al., 1 975; Ninfa and Jiang, 2005). This results in Tyr51 -uridyly- 
lated P||, which is, for instance, not able to interact with the 
ammonium channel AmtB (Radchenko and Merrick, 2011). 
Conversely, increasing glutamine levels cause Pn-deuridylylation 
(Adler et al., 1 975), allowing the interaction of Pn with various tar- 
gets (Huergo et al., 2013). As a consequence, glutamine-depen- 
dent uridylylation/deuridylylation of Pn is the prevailing signaling 
mechanism of the Escherichia coii Pn system (Forchhammer, 
2007; Ninfa and Atkinson, 2000). In analogy to proteobacterial 
Pii-uridylylation, Pn in actinobacteria is adenylylated at Tyr51 in 
a glutamine-dependent manner (Hesketh et al., 2002). In cyano- 
bacteria, the neighboring residue Ser49 is phosphorylated in 
response to nitrogen starvation (Forchhammer and Tandeau 
de Marsac, 1994), but the detailed molecular signals controlling 
the kinase activity are still unknown. In many other organisms, 
this second regulatory layer of covalent modification of the T 
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loop is apparently missing, as in Bacillus (Heinrich et al., 2006), 
Archaea (Leigh and Dodsworth, 2007), and plant Pn proteins 
(Smith et al., 2004; Uhrig et al., 2009). 

Phylogenetic analysis has inferred that plant Pn proteins are of 
cyanobacterial origin and have been conserved during the evolu- 
tion of the Chloroplastida (plant kingdom, comprising green algae 
and land plants) from the ancestral cyanobacterial endosymbiont 
to higher plants. By contrast, Pn signaling has been lost in some 
red algae and in the Chromalveolata (Chellamuthu et al., 2013). 
The only plant Pn protein that has been studied biochemically 
in depth is that of Arabidopsis thallana (Mizuno et al., 2007a; 
Smith et al., 2003). Physiological studies indicate that plant Pn 
proteins are part of a complex signal-transduction network medi- 
ating nitrogen regulation (Hsieh et al., 1998; Nunes-Nesi et al., 
2010). P|| mutants of A thallana are affected in the synthesis 
of compounds derived from the ornithine/arginine synthesis 
pathway (Ferrario-Mery et al., 2006), overaccumulate carbon me- 
tabolites (Ferrario-Mery et al., 2005), and show impaired control 
of nitrite uptake into the chloroplast (Ferrario-Mery et al., 2008). 
Furthermore, Pn seems to be required for proper seed develop- 
ment (Uhrig et al., 2009). Such tissue-specific roles of Pn signaling 
together with the tightly intertwined complex network of signal- 
transduction cascades (Nunes-Nesi et al., 2010) complicate the 
elucidation of Pn-specific functions in higher plants. To study 
fundamental problems in plant chloroplasts, the simple unicellu- 
lar green alga Chlamydomonas reinhardtii has proved an excel- 
lent model organism (Harris, 2001). For this reason, we started 
studying Pn signaling in C. reinhardtii (Ermilova et al., 2013). 

Like its higher plant homologs, C. reinhardtii Pn (CrPn) is local- 
ized in the chloroplast. Residues involved in effector molecule 
binding are conserved, which suggests that CrPn functions as 
a signaling protein as well. The chloroplast-localized enzyme 
A/-acetyl-L-glutamate kinase (NAGK), which catalyzes the 
committed step in the ornithine/arginine biosynthesis pathway, 
is an important Pn-interacting protein in cyanobacteria and 
higher plants (Ferrario-Mery et al., 2006; Heinrich et al., 2004; Su- 
giyama et al., 2004). The structures of the Pn-NAGK complex 
from the cyanobacterium Synechococcus elongatus and from 
A thallana have been solved and are highly similar (Llacer 
et al., 2007; Mizuno et al., 2007b). Beez et al. (2009) have shown 
that the Pn-NAGK proteins from S. elongatus and A. thallana 
functionally complement each other in vitro, highlighting striking 
functional conservation. Binding of Pn enhances NAGK activity in 
the presence of the feedback inhibitor arginine, an effect that can 
be antagonized by the Pn-effector molecule 2-OG. When Pn 
binds to NAGK, the T loops of Pn must fold in a tightly flexed 
structure to insert into the interdomain crevice of the adjacent 
NAGK subunits (Llacer et al., 2007; Mizuno et al., 2007b). 

An unsolved question in the evolution of plant Pn proteins con- 
cerns the appearance of an elongated C terminus. Whereas all 
bacterial Pn proteins have an almost invariable C terminus, the 
Pii proteins from the plant kingdom contain a C-terminal exten- 
sion of 13-19 amino acids with a conserved motif. The function 
of the C-terminal extension is so far unknown. In the present 
work, we studied the structural and biochemical properties of 
CrP||. We discovered that the C-terminal extension of CrPn forms 
a small loop structure (termed the Q loop) that binds glutamine 
and is required for glutamine-dependent complex formation 



with NAGK. The same glutamine dependence was observed 
for recombinant Physcomitrella and Oryza Pn proteins, which im- 
plies that plant Pn proteins function as glutamine sensors via the 
C-terminal Q loop extension. 

RESULTS 

Basic Properties of C. reinhardtii NAGK, the Putative Pn 
Target 

The predicted full-length NAGK polypeptide encoded by the 
C. reinhardtii AGK1 cDNA consists of 340 amino acids with a 
calculated molecular mass of 35,971 Da. The N terminus 
contains a predicted chloroplast transit peptide (residues 1- 
42), which suggests that, like its higher plant homologs, 
C. reinhardtii NAGK (CrNAGK) protein resides in the chloroplast 
with a molecular mass of 31 ,474 Da. A previous survey showed 
that all NAGK proteins from cyanobacteria and plants (Chloro- 
plastida), and from those red algae that contain Pn proteins, 
share the signature residues involved in NAGK-Pn interaction 
(Chellamuthu et al., 2013; an alignment is shown in Figure SI 
available online), which strongly implies that they all are targets 
of P|| signaling. To study the biochemical properties and to char- 
acterize the putative interaction between CrPn and CrNAGK, a 
recombinant CrNAGK protein was generated (starting from res- 
idue Met44) with its N terminus fused to a Hise tag. Analysis of the 
purified CrNAGK protein by gel filtration revealed that most of the 
protein eluted in the expected hexameric state, whereas a 
smaller fraction eluted with an apparent higher mass. By com- 
parison, A. thallana NAGK (AfNAGK) eluted as a single peak, cor- 
responding to the hexameric enzyme (Figure S2). The kinetic 
constants of CrNAGK showed a K^ value for A/-acetyl glutamate 
of 7.77 mM and a kcat of 56.76 s“^ (calculated with NAGK hex- 
amer). This Km value is 9-fold higher than that of AfNAGK but 
very similar to that of the cyanobacterial S. elongatus NAGK 
(SeNAGK) (Km [NAG] of 7.4 mM), whereas the kcat value is in- 
termediate between that of AfNAGK (126 s“^) and SeNAGK 
(13 s“^). Feedback inhibition by arginine occurred with a half- 
maximal inhibitory concentration (IC50) of 0.11 mM (Figure IB), 
which is intermediate between that of cyanobacterial SeNAGK 
(0.02 mM) and plant AfNAGK (1 mM). 

Glutamine-Dependent Interaction between CrPn 
and NAGK 

To test the effect of CrPn on the catalytic activity of CrNAGK, we 
assayed NAGK using a coupled ATPase assay (Beez et al., 
2009). In a control, ATPase activity of CrPn in the absence of 
NAGK was below the detection limit of the assay. The previously 
reported ATP hydrolysis activity of Pn proteins is not catalytic, 
but needs a large excess of Pn over ATP (Radchenko et al., 
2013). 

Determining the relief from arginine inhibition by Pn-NAGK 
complex formation represents the most sensitive assay for Pn- 
NAGK interaction. In arginine-inhibition assays with CrNAGK in 
the presence or absence of CrPn (Figure 1A), the presence of 
CrPii surprisingly did not change arginine inhibition of CrNAGK, 
nor did it change its catalytic activity. As a possible suitable pos- 
itive control of CrPn function, we tested the effect of CrPn on 
AfNAGK, which we previously showed to functionally interact 
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Figure 1 . Arginine Inhibition of CrNAGK Ac- 
tivity in the Presence of CrPn 

(A) Arginine inhibition of CrNAGK in the absence or 
presence of CrPn, without or with 5 or 10 mM 
giutamine, as indicated. 

(B) Arginine inhibition of CrNAGK in the presence 
of 1 0 mM giutamine or in the presence of CrPn and 
10 mM giutamine. Data were fitted to a sigmoidai 
dose-response curve, yieiding an EC 50 (arginine) 
of 0.11 ± 0.04 mM and 1.05 ± 0.15 mM for free or 
CrP||-interacting CrNAGK, respectiveiy. 

When exceeding the size of the data point sym- 
bois, standard deviation from tripiicate measure- 
ments is indicated by error bars. 



not only with its own Pn protein but also with Pn from the cyano- 
bacterium S. elongatus (Beez et al., 2009). However, CrPn did not 
affect the arginine response of AfNAGK (data not shown). We 
therefore suspected that an additional component may be 
required for CrPn interaction with NAGK. We added amino acids 
and metabolites from core metabolism (glutamate, glutamine, 
aspartate, asparagine, alanine, serine, glycine, acetyl-CoA, and 
tricarboxylic acid cycle intermediates) to the arginine-inhibition 
assays of CrNAGK-CrPn. Of the various compounds tested, 
glutamine indeed rescued the activity of CrPn; in the presence 
of 5 or 10 mM glutamine, the inhibitory effect of arginine was 
strongly antagonized (Figure 1A). In the presence of Pn and 
10 mM glutamine, the IC50 of arginine for CrNAGK increased 
from 0.1 mM to 1 mM (Figure 1 B). The highest difference in activ- 
ity between CrNAGK with or without CrPn was with 0.4 mM argi- 
nine. Therefore, we titrated the effect of glutamine with 0.4 mM 
arginine. Glutamine activated arginine-inhibited CrNAGK in the 
presence of CrPn in a concentration-dependent manner (Fig- 
ure 2A), whereas glutamine had no effect in the absence of CrPn. 
The half-maximal effective concentration (EC50) of glutamine for 
activation of NAGK by CrPn was 2.4 ± 0.8 mM. To reveal whether 
the glutamine dependence of CrPn-NAGK activation is a property 
of CrPii or of CrNAGK, heterologous assays were carried out us- 
ing AfNAGK. CrPii had no effect on AfNAGK in the absence of 
glutamine. By contrast, arginine inhibition was relieved by gluta- 
mine in a concentration-dependent manner (Figure 2B). The 
response was similar to the glutamine-dependent activation of 
CrNAGK by CrPn, with a glutamine EC50 of 4.6 ± 2.4 mM, which 
showed that the glutamine dependence is a property of CrPn. 

To determine the response of CrPn-CrNAGK interaction 
toward 2-OG, we titrated CrPn and CrNAGK (5 CrPn trimers:1 
CrNAGK hexamer) in the presence of 10 mM glutamine and 
0.4 mM arginine with increasing concentrations of 2-OG. 2-OG 
tuned down NAGK activity with an IC 50 of 1 .18 ± 0.02 mM (Fig- 
ure 20). An almost identical response toward 2-OG was ob- 
tained for the heterologous assay with CrPn and AfNAGK 
(Figure 2D; IC50 of 1 .26 ± 0.96 mM). The catalytic constants of 
the CrP||-CrNAGK complex determined at 5 mM glutamine 
showed that CrPn does not enhance the Vmax (54 s“^) of the 
NAGK reaction but lowers the K^ for NAG about 2-fold (3.9 ± 
0.4 mM as compared to 7.8 ± 0.8 mM). 



Analysis of CrPn-NAGK Complex Formation 

To confirm that the glutamine-dependent regulatory effect of 
CrPii on NAGK is due to direct interaction, we analyzed complex 
formation using surface plasmon resonance (SPR) spectros- 
copy. The N-terminally His-tagged CrNAGK protein was immo- 
bilized on an Ni-NTA sensor chip and probed with CrPn together 
with effector molecules. CrPn bound to CrNAGK in a glutamine- 
dependent manner; with 5 mM glutamine, complex formation 
appeared to be saturated (Figure 3A). This result confirmed 
that complex formation between CrPn and CrNAGK is strictly 
glutamine dependent. Furthermore, complex formation of CrPn 
with CrNAGK strictly required Mg-ATP and was not supported 
by ADP (Figure 3B). Finally, we analyzed the effect of 2-OG on 
CrP||-NAGK complex formation. Regardless of the 2-OG con- 
centration added to the assay mixture (containing 1 |iM CrPn 
and 5 mM glutamine), no antagonistic effect on complex forma- 
tion was observed (Figure 30), although these 2-OG concentra- 
tions prevented activation of NAGK (Figure 20). This result 
resembles the case of A thaliana Pn-NAGK interaction, in which 
the 2-OG concentrations required to inhibit complex formation 
are three orders of magnitude higher than those required to 
impair NAGK activation by Pn (Beez et al., 2009). Similarly, in 
E. coli, the regulation of NtrB activity by Pn, but not its binding, 
is controlled by 2-OG (Jiang and Ninfa, 2009a). Thus, the inhibi- 
tory effect of 2-OG on Pn-mediated activation of NAGK in 
C. reinhardtii appears to occur postbinding. 

Structural Characterization of CrPn 

The EC50 of glutamine for stimulating CrPn-NAGK interaction 
was in the millimolar range, which indicated a low affinity of CrPn 
for glutamine, outside the sensitivity range of direct ligand-bind- 
ing assays. Titration of glutamine to CrPn by isothermal titration 
calorimetry (ITC) yielded only marginal calorimetric signals. To 
define the role of glutamine and identify its putative binding 
site in CrPn, we used crystallographic analysis. First, we set up 
CrPii crystallization trials in the presence of different effector mol- 
ecules. From these trials, we obtained well-diffracting crystals 
that grew in the presence of Mg-ATP and glutamine, for which 
we could solve the structure by molecular replacement using 
the SePii structure (Protein Data Bank [PDB] ID code) 2XZW as 
a search model. The resulting CrPn structure has the expected 
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Figure 2. Effect of Glutamine or 2-OG on 
CrPii Activation of CrNAGK and AfNAGK 

CrNAGK assays contained 0.4 mM arginine; 
AfNAGK assays contained 3 mM arginine. 

(A) Giutamine-dependent activation of CrNAGK by 
CrPii (circies) or by CrPn-AC (squares). 

(B) Giutamine-dependent activation of AfNAGK by 
CrP,,. 

(C and D) Antagonistic effect of 2-OG on 
CrPii/IO mM giutamine-mediated activation of (C) 
CrNAGK or (D) AfNAGK. 

The standard deviation from tripiicate measure- 
ments is indicated by error bars. 




trimeric assembly and superimposes (with a root-mean-square 
deviation [rmsd] of 0.85 A over 271 aligned Ca positions) closely 
on the SePii structure 2XZW. However, neither Mg-ATP nor 
glutamine was identified in the structure, and both the T loop 
and the whole C-terminal extension were disordered (Figure 4A). 
Therefore, we then tried to bind ligands to the protein by soaking 
the apo crystals in solutions containing different combinations of 
effector molecules. This was successful for the combination of 
Mg-ATP and 2-OG, which resulted in a crystal structure with 
asymmetrically occupied effector sites— one site occupied by 
Mg-ATP and 2-OG in full occupancy, the second site occupied 
by Mg-ATP and 2-OG in half-occupancy, and the third site 
empty (Figure 4B). Half-occupancy denotes that, on average, 
throughout the crystal, only every second Pn trimer has a mole- 
cule bound in the respective binding site, which indicates an 
interaction weaker than binding with full occupancy. The struc- 
ture superimposes with an rmsd of 0.47 A on the apo structure, 
and both the T loop and C-terminal extension are unstructured in 
all subunits. The binding mode of the effector molecules is 
essentially the same as reported for various bacterial and 
archaeal Pn proteins (Fokina et al., 2010; Maieret al., 201 1 ;Truan 
et al., 2010). The asymmetrical filling of the sites suggests that 
binding of the effector molecules is anticooperative, as reported 
previously for the E. coli (Jiang and Ninfa, 2009a) and 
S. elongatus Pn protein (Fokina et al., 2010; Zeth et al., 2014). 

Identification of the Glutamine-Binding Site in the 
Pii-NAGK Complex Structure 

Crystallization trials were carried out with CrPn-NAGK com- 
plexes in the presence of glutamine and different combinations 
of effector molecules. Because initial crystallization trials with 
the homologous CrNAGK were not fruitful, we set up trials with 
the heterologous AfNAGK, because the glutamine response of 



the CrP||-AfNAGK complex was almost 
identical to that of the homologous CrPn- 
CrNAGK complex. This setup rewarded 
us with crystals of the whole complex 
that finally disclosed the binding mode 
of glutamine. Glutamine was localized to 
the C-terminal extension of CrPn. Inter- 
estingly, these CrP||-AfNAGK crystals 
have the same crystal packing as those 
of the AfP||-AfNAGK structure (PDB ID 
code) 2RD5, so that the structure could 
be solved directly on the basis of the 2RD5 coordinates (see 
Extended Experimental Procedures). Consequently, the com- 
plex is almost identical to the AfPn-AfNAGK complex and very 
similar to the SePn-NAGK complex, with a hexameric NAGK 
toroid sandwiched between two Pn trimers (Figure 5A). As 
described for the A. thaliana Pn-NAGK complex (Mizuno et al., 
2007b), the NAGK subunits are in an open conformation on 
one face of the toroid and in a closed conformation on the other 
face. The ligand-binding sites of the closed subunits are occu- 
pied with arginine, NAG, and Mg-ADP, whereas the subunits in 
open conformation are occupied with arginine and the reaction 
product NAG-phosphate but not with a nucleotide. Pn is in com- 
plex with the desired glutamine and Mg-ATP, although no ATP 
had been added to the crystallization solution (Figure 5B). The 
presence of ATP, presumably originating from a contamination, 
further strengthens the conclusion that CrPn can only bind Af- 
NAGK in the Mg-ATP complex. 

Because most residues important for complex formation are 
conserved between CrPn and AfPn, it is not surprising that the 
overall structure and in particular the conformation of the T 
loop of the CrP||-AfNAGK and AfPn-AfNAGK (2RD5) complexes 
are nearly identical. A superposition of Pn trimers from both com- 
plexes yields an rmsd of 0.79 A over 372 aligned Ca positions, 
which supports the significance of the CrPn conformation in 
the heterologous complex. 

The remarkable and important difference between AfPn and 
CrPii in complex with NAGK concerns the C-terminal extension. 
In glutamine-bound CrPn, the extension forms two additional 
short a-helical segments connected by a small loop, from 
Lys131 to Gly135 (Figure 5C). This extension constitutes a large 
portion of the glutamine-binding site. Glutamine mainly forms 
hydrogen bonds to main-chain atoms of this extension and to 
Arg43, Gly44, and Val82 (Figure 5D). Additional hydrogen bonds 
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Figure 3. CrPn-CrNAGK Complex Formation Analyzed by SPR 
Spectroscopy 

CrP\\ with effectors was injected on immobiiized CrNAGK in flow cell 2; the 
response difference between FC2 and the control FC1 is shown. 

(A) Glutamine-dependent binding of C/'Pn to CrNAGK at 0 (gray line), 1 mM 
(dot-dashed line), 2.5 mM (dot-dot-dashed line), 5 mM (dotted line), and 
7.5 mM (black line) glutamine. 

(B) ATP-dependent complex formation at 5 mM glutamine. Lines: black, 1 mM 
ATP/2 mM MgCl 2 ; dotted, 1 mM ADP; dash-dotted, 1 mM ADP/2 mM MgCl 2 ; 
dot-dot-dashed, 1 mM ADP/2 mM MgCl2/5 mM glutamine. 

(C) Influence of 2-OG on CrPn-CrNAGK interaction at 5 mM glutamine. Gray 
line, no 2-OG; other lines, 0.5, 1,3,5,10, or 20 mM 2-OG. 



are formed with the side chains of Glu130 and Gln87. The struc- 
ture of the unique C-terminal loop (from Lys1 31 to Glyl 35), which 
we term the Q loop, is further stabilized by contacts with a region 
from the basal part of the T loop, which also takes part in effector 
molecule binding (Figure 6A). Two hydrogen bonds are formed 
between the Q loop and T loop residues Gln56 and Gly57. Of 
particular interest is Gln56 (corresponding to Gln39 in bacterial 
Pii proteins), as it is a key residue for the structure of the T 
loop, switching between different Pn structural states (Huergo 
et al., 2013; Zeth et al., 2014). It seems therefore that its interac- 
tions with the Q loop help to bring the T loop into its flexed 
conformation, required for complex formation with NAGK. 

Functional Implications of Glutamine Binding 

Although the crystal structure reveals the binding site of gluta- 
mine, it does not mechanistically explain why glutamine is essen- 
tial for CrPii complex formation. Either the C-terminal segment 
acts as an antideterminant, preventing binding to NAGK, or the 
Q loop in the glutamine-bound state supports complex forma- 
tion. To distinguish between these possibilities, the C-terminal 
segment of CrPn was deleted, shortening the protein to the ca- 
nonical length of bacterial Pn proteins. The recombinant trun- 
cated CrPii protein (CrPn-AC) has the expected size of a trimeric 
Pii protein (Figures S3A and S3B). Furthermore, the truncated 
protein retained the core function of Pn proteins, as deduced 
from 2-OG-binding assays using ITC. The isotherms could be 
fitted to a one-site binding model with a Kd of 90 |iM, as 
compared to 39 |iM for the full-length protein (Figures S3C and 
S3D). Interaction of CrPn-AC with NAGK was tested in enzyme 
assays (Figure 2A) as well as in SPR experiments (Figure S3E). 
In no assays was interaction detected, neither in the presence 
nor absence of glutamine, which indicated that the glutamine- 
binding C-terminal segment is actively required for CrPn to 
bind to NAGK in a glutamine-dependent manner. 

Glutamine Sensing Is a General Property of Plant 
Pii Proteins 

Sequence alignment of plant Pn proteins (Figure 6C; an extended 
list is in Figure S4) reveals that the C-terminal residues of the Q 
loop are part of a conserved motif present in all plant sequences 
that we analyzed, except the Brassicaceae family, to which 
A. thaliana belongs. The Q loop in members of the Brassicaceae 
family exhibits a deletion of three amino acids, which might help 
the Q loop to fold in a glutamine-independent manner (Figure 6B). 
To find out whether the missing glutamine response of A thaliana 
Pii is a consequence of this genetic trait, we exchanged the 19 
amino acid Q loop extension of CrPn (see Figure 6C) with the 
15 amino acid C terminus of AtP\\, and vice versa, to obtain 
chimeric CrPn with the A. thaliana C terminus (CrP\\/AtQ) and 
chimeric AtP\\ with the C. reinhardtii Q loop terminus (AtP\\/ 
CrQ). The CrP\\/AtQ chimera only very weakly activated CrNAGK 
(Figure 7A), which indicated that the C terminus of A thaliana Pn 
cannot efficiently replace that of C. reinhardtii, and that this 
replacement completely abrogated the glutamine response. 
Conversely, the AtP\\/CrQ chimera was functional and, more- 
over, showed partial glutamine dependence (Figure 7B). There- 
fore, the inability of A thaliana Pn to respond to glutamine was 
indeed caused by the degenerated C terminus. The conserved 
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Q loop C terminus of plant Pn proteins not belonging to the Bras- 
sicaceae strongly implies that glutamine sensing might be a gen- 
eral feature of Pn function in plants. To test this assumption, we 
prepared recombinant Pn proteins from two very diverse plants, 
the moss Physcomitrella patens and the rice Oryza sativa. To test 
the glutamine response, these Pn proteins were added to argi- 
nine-feedback-inhibited /AfNAGK with increasing concentrations 
of glutamine. Both Pn proteins were able to release /AfNAGK from 
arginine-feedback inhibition in a glutamine-dependent manner: 
Physcomitrella Pn with an EC50 of 6.6 mM, and Oryza Pn with 
an EC50 of 9.2 mM (Figure 7C). This glutamine response is similar 
to that of CrPii (Figure 2B), which confirmed that glutamine 
sensing is a general property of plant Pn proteins. 

DISCUSSION 

Maintenance of a proper carbon-nitrogen balance is pivotal 
for plant growth and development. A functional understanding 
of the underlying regulatory mechanisms is fundamental to 
improvement of crop yield and resistance (Nunes-Nesi et al., 
2010; Lea and Miflin, 2010). In higher plants, the signal-trans- 
duction network associated with nitrogen assimilation is 
extremely complex, integrating signals from hormones, nitrate, 
sugars, organic acids, and amino acids (Foyer et al., 2003; 
Nunes-Nesi et al., 201 0; Zheng, 2009). Glutamine, as the primary 
product of ammonium assimilation, is of particular importance, 
and is one of the most abundant amino acids of light-grown 
plants (Fritz et al., 2006). Ammonium is provided by nitrate 
reduction, photorespiration, and amino acid recycling, and is 
mainly assimilated in the chloroplast by glutamine synthetase 
(Keys, 2006; Lea and Miflin, 2010). Glutamine donates nitrogen 
groups, directly or via glutamate, for essentially all nitrogenous 
cell compounds. 



Figure 4. Side and Top Views of the Trimer 
Structure of CrPn in the Apo State and with 
Bound Mg-ATP and 2-OG 

In the apo structure (A), sulfate ions from the 
reservoir soiution are bound in piace of the ATP y- 
phosphate in ali three monomers. Upon soaking 
these crystais with Mg-ATP and 2-OG (B), Mg-ATP 
is bound with fuii occupancy in two monomers; 
2-OG is bound fuiiy in one monomer, with haif- 
occupancy in the other. The third monomer re- 
mains in the suifate-bound apo state. The T ioop 
and the C-terminai extension inciuding the Q ioop 
are unstructured in aii states. Data coliection and 
refinement statistics for the structures are shown 
in Tabie S1. 



The present study identified plant Pn 
proteins as glutamine sensors. Pn pro- 
teins were known to be localized in chlo- 
roplasts and to regulate a key enzyme of 
arginine biosynthesis, NAGK, in response 
to 2-OG level (Uhrig et al., 2009). From 
biochemical analyses of Synechococcus 
and Arabidopsis Pn and NAGK proteins, 
it seemed that Pn-NAGK interaction is 
highly conserved from cyanobacteria to higher plants: Pn en- 
hances the catalytic activity of arginine-feedback-inhibited 
NAGK when Pn is ATP bound and not in complex with 2-OG 
and, conversely, ADP or 2-OG antagonizes productive interac- 
tion between Pn and NAGK (Beez et al., 2009; Mizuno et al., 
2007b). From the conservation of critical amino acids in Pn and 
NAGK proteins involved in complex formation, we assumed 
that this interaction would be conserved in green algae, the 
phylogenetic predecessors of higher plants. Surprisingly, the 
Chlamydomonas Pn protein required glutamine in millimolar con- 
centrations to interact with NAGK. Moreover, Pn proteins from 
the moss Physcomitrella and from the rice Oryza both showed 
a glutamine requirement similar to Chlamydomonas Pn. By 
contrast, A. thaliana Pn acted independent of glutamine. The 
slight truncation of the AtPw C-terminal extension turned out to 
be responsible for the divergent property. 

To mechanistically explain why AfPn binds to NAGK indepen- 
dent of glutamine in contrast to the other Pn proteins requires an 
in-depth comparison of the structures. The C-terminal extension 
of CrPii forms a small helix-Q loop-helix structure, which wraps 
around the bound glutamine molecule. The bound glutamine 
acts like a pillar that puts the Q loop in place through its carboxyl 
group and holds the Q loop through main-chain interactions. 
Backbone interactions from the Q loop with Gln56 and Gly57 
from the base of the T loop stabilize the highly flexible T loop 
and Q loop structures. In agreement, in crystals of CrPn not 
ligated to NAGK, these structures are disordered, which sug- 
gests that the flexible T loop and Q loop become structured 
upon NAGK binding. This agrees with the fact that NAGK ar- 
ranges major contacts to the T loop of Pn. A CrPn variant with 
the C terminus truncated to the length of bacterial Pn proteins 
was unable to interact with NAGK, although it appeared to be 
an intact Pn protein with respect to ternary structure and 2-OG 



Cell 159, 1 1 88-1 1 99, November 20, 201 4 ©201 4 Elsevier Inc. 11 93 






Cell 



B 







C-extension 
with Q-loop 



T-loop 
B-loop 



C ^^^^Q-ioop 

GLEAEKMEGGMEDMMKKKK 



binding. This suggests that the T loop in the absence of the Q 
loop cannot adopt the structure that binds NAGK. The interac- 
tions between the Q loop and the T loop are apparently neces- 
sary to stabilize the structure. The same stabilizing interactions 
between the C terminus and the T loop are found in the 
A thaliana Pn-NAGK complex (see Figure 6B). However, in this 
case, no binding of glutamine is required. Owing to the deletion 
of three amino acids at the Q loop, an aspartate residue (Asp1 29) 
projects into the glutamine-binding pocket and a hydrogen- 
bonding network with water molecules fills the cavity. A key res- 
idue for determining the structure of the T loop in CrPn is Gln56 
(corresponding to Gln50 in AtP\\ and Gln39 in S. elongatus Pn). 
The importance of this residue for stabilizing T loop conforma- 
tions was originally recognized in E. coli Pn protein GInB (Jiang 



Figure 5. Glutamine-Bound CrPn in Com- 
plex with AfNAGK 

(A) P|| monomers are individually colored; NAGK is 
in two shades of gray. In the Pn trimer on top, one 
C-terminal extension is highlighted in red. 

(B) Side and top views of the Pn trimer from the 
complex. The T loop, B loop, and whole C-terminal 
extension are individually colored. The whole 
C-terminal extension, consisting of two a helices 
connected by the Q loop, constitutes a large 
portion of the glutamine-binding site. 

(C) The sequence of the C-terminal extension, 
highlighting the consensus motif. 

(D) A stereo close-up of the glutamine-binding site 
reveals a sophisticated hydrogen-bonding net- 
work. Electron density for glutamine is shown 
as an Fq - Fc omit map contoured at 4a. Data 
collection and refinement statistics of the CrPn- 
AfNAGK complex are shown in Table S1 . 



et al., 1997). In bacterial Pn proteins, 
different Gln39 contacts switch the T 
loop in various states (Huergo et al., 
2013; Zeth et al., 2014). In both CrP,, 
and AfPii complexes with AfNAGK, the 
corresponding glutamine side chain 
forms a bridge between the y-phosphate 
oxygen of the bound ATP molecule and 
the backbone Met132 (CrPn) (Met126 in 
AtP\\), thereby stabilizing the tightly flexed 
T loop conformation. As observed here 
for the CrP||-AfNAGK structure, all Pn 
sites of the A thaliana Pn-NAGK structure 
are occupied by ATP, although no ATP 
was added to the crystallization solution 
and must have been acquired from 
contaminating ATP in the ADP solution 
(Mizuno et al., 2007b). This demonstrates 
the strict ATP requirement for complex 
formation. This can be explained if we as- 
sume that the Gln56 interaction between 
ATP and the C-terminal backbone is 
necessary to fix the T loop in the flexed 
conformation necessary for insertion 
into the crevice of NAGK. In cyanobacteria, the C-terminal Q 
loop extension does not exist but, nevertheless, Pn avidly inter- 
acts with NAGK. Moreover, this interaction does not require 
ATP, which indicates that in this case the T loop can fold into 
the NAGK-fitting structure without engaging the Gln39-ATP con- 
tact. Indeed, in the S. elongatus Pn-NAGK structure, which is 
devoid of ATP, Gln39 is not involved in any contacts (Llacer 
et al., 2007). 

Of 55 examined Pn C-terminal sequences of plants including 
green algae (Figure S4), only the 5 representatives of the Brassl- 
caceae family have the deletion found in Arabidopsis, and of the 
50 remaining cases, only 3 have significant alterations in the Q 
loop consensus motif. From this almost invariant conservation, 
we deduced that the formation of the Pn-NAGK complex in 
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Figure 6. Detailed View of the Interactions between the Q Loop and T Loop in Pn from C. reinhardtii and A. thaliana 

(A and B) C. reinhardtii (A) and A. thaiiana (B). In both complexes, the structured Q loop forms the same stabilizing hydrogen bonds with the y-phosphate- 
coordinating glutamine in the T loop (Q56 in CrPn, Q50 in AfPn) and the subsequent glycine (G57/G51). Structurally, the twin glycine residues (G134 and G135 in 
CrPii) of the Q loop provide space for the glutamine ligand. Owing to the deletion of three amino acids, including the second glycine, the glutamine-binding site 
cannot form in AfPn, as the first glycine is directly followed by Asp1 29. With the involvement of three bridging water molecules mimicking the bound ligand, this 
stunted Q loop can fold independent of glutamine binding, whereas in CrP\\ the folding of the whole C extension depends on glutamine binding. The three water 
molecules in AtP^ were identified in a reevaluation of the experimental data (see Extended Experimental Procedures). 

(C) Sequence alignment of Pn C termini of plants and cyanobacteria, highlighting the Q loop consensus motif (bold) and the deletion of three amino acids in the 
Brassicaceae family. Color labeling shows secondary structure assignment (strands, cyan; helices, pink) according to the C/'Pn-AfNAGK and AfPn-AfNAGK 
complex structures. An extended alignment including 55 plant Pn sequences is shown in Figure S4. 



Chlamydomonas reinhardtii 
Physcomitrella patens 
Oryza sativa 
Picea sitchensis 
Nicotiana tabacum 

Brassicaceae Arabidopsis thaliana 

(degenerate extension) Capsella rubella 

Cyanobacteria Nostoc punctiforme 

(no extension) Synechococcus elongatus 



Green plants 
(full extension) 



plants generally requires binding of glutamine to the Q loop, and 
this was confirmed by examination of Physcomitrella and Oryza 
Pii proteins. The half-maximal effective concentration of gluta- 
mine was determined to be in the millimolar range, which likely 
reflects the Kd of the glutamine-binding site (EC50 of 2.4 mM, 
9.2 mM, and 6.6 mM for Chlamydomonas Pn, Physcomitrella 
Pii, and Oryza Pn, respectively). Notably, these glutamine con- 
centrations are within the range of the estimated concentrations 



(2.5-20 mM) of glutamine in different plant species such as to- 
bacco, spinach, and barley (Fritz et al., 2006; Riens et al., 
1 991 ; Winter et al., 1 993, 1 994). Therefore, the low-affinity gluta- 
mine-binding mode by the Pn Q loop is perfectly tuned for 
sensing physiologically relevant glutamine levels, which strongly 
implies regulatory significance. Only when the glutamine con- 
centration in the chloroplast is in the range of several millimolar 
could P|| efficiently activate NAGK and antagonize feedback 
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Figure 7. The C-Terminal Segment of Pn Proteins from Plants Other 
Than A. thaliana Mediates Pn Signaling toward NAGK 

(A and B) Chimeric Pn proteins consisting of (A) CrPn with the C-terminai Q ioop 
of A. thaliana {CrPu/AtQ) and (B) A. thaliana Pn with the C-terminai Q ioop of 
C. reinhardtii (AfPi/C/'Q) were tested for giutamine-dependent activation of 
cognate arginine-feedback-inhibited NAGK. 



inhibition by arginine. Indeed, glutamine levels in plants strongly 
respond to nitrogen status, with amplitude changes of almost 
200-fold reported in tobacco leaves (Fritz et al., 2006). In Chla- 
mydomonas cells, glutamine levels are 1 0-fold lower under nitro- 
gen deficiency (Bolling and Fiehn, 2005), accompanied by a 
similar decline in metabolites of the ornithine/arginine synthesis 
pathway. Thus, the flow of amino acids into this pathway, which 
feeds into the arginine pool as well as into polyamines, appears 
to be controlled by the glutamine level via the Pn protein. Arginine 
has multiple metabolic roles in addition to its roles as a building 
block in protein biosynthesis, and can also be used as a nitro- 
gen-storage molecule (Llacer et al., 2008). Only at high glutamine 
levels, which indicate sufficient ammonia supply, is synthesis of 
nitrogen-storage compounds beneficial. 

An intriguing question concerns the evolution of Brassicaceae 
Pii proteins. A three amino acid deletion at the Q loop has appar- 
ently occurred to generate a glutamine-insensitive variant. Graft- 
ing the Q loop from Chlamydomonas onto the A. thaliana Pn body 
restored glutamine sensitivity at least partially, and shows that 
A. thaliana Pn can be converted back to a glutamine-sensing pro- 
tein. By contrast, the A. thaliana C terminus transplanted onto 
the CrPii body was almost nonfunctional, which indicates that 
additional modifications in the Pn body are required to evolve a 
functional glutamine-insensitive Pn protein. Although glutamine 
sensing is lost in the Brassicaceae, their Pn protein is still res- 
ponsible for the regulation of the ornithine/arginine synthesis 
pathway; accordingly, Pn knockdown lines of A. thaliana show 
impaired accumulation of metabolites from the ornithine/arginine 
pathway (Ferrario-Mery et al., 2006). The selective pressure that 
resulted in glutamine-insensitive Brassicaceae Pn proteins is un- 
known, but it could point toward a special feature of nitrogen 
metabolism in this plant family. 

Glutamine is the primary nitrogen status reporter in many bac- 
teria (Forchhammer, 2007). There, information about the cellular 
glutamine status is perceived by the Pn signaling system via 
glutamine-sensitive modification enzymes, such as the uridylyl- 
transferase/uridylyl-removing enzyme in E. coll (Adler et al., 
1975; Jiang and Ninfa, 2011). In the development of green algal 
chloroplasts from cyanobacterial ancestors, evolution has rein- 
vented glutamine sensing by Pn signal transducers using a 
modular concept: a small extension was attached to the C termi- 
nus, which controls the interaction of Pn with receptors through a 
subtle network of interprotein interactions for which binding of 
glutamine is necessary. This is a striking example of convergent 



(A) C. reinhardtii NAGK with or without the CrP\\/AtQ chimera was assayed 
under standard conditions containing 400 ^iM arginine in the absence or 
presence of 5 or 20 mM giutamine. 

(B) Giutamine-dependent activation of arginine-feedback-inhibited AfNAGK 
by the AtP^/CrQ chimera. 

(C) Giutamine-dependent activation of arginine-feedback-inhibited AfNAGK 
by Physcomitrella Pn (PpPn) or Oryza Pn (OsPn). 

Assays in (B) and (C) contained the NAGK feedback inhibitor arginine at 20 mM 
and increasing concentrations of giutamine, as indicated. The activity of NAGK 
in the absence of any Pn protein (without Pn) at 0 and 5 mM giutamine (B) and 
0 and 30 mM giutamine (C) is indicated. 

The standard deviation from tripiicate measurements is indicated by error 
bars. 
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evolution in one of nature’s most abundant signaling proteins. By 
the identification of Pn as a glutamine sensor, this work opens 
the way to a deeper understanding of nitrogen homeostasis in 
plants and may help optimize plant breeding for crop yield 
improvement. 

EXPERIMENTAL PROCEDURES 

Detailed methods are given in Extended Experimentai Procedures. 

Cloning, Expression, and Purification of Pn Proteins 

Recombinant Pn from A. thaliana (AfPn) and from C. reinhardtii Pn (CrPn) with a 
C-terminaiiy fused Strep-tag ii sequence were overexpressed and purified as 
described previousiy (Beez et ai., 2009; Ermiiova et ai., 2013). The CrPn-AC- 
coding gene was constructed using standard PGR techniques. AtP\\/CrQ 
chimera, P. patens PpPw, and O. sativa OsPn genes were commerciaiiy synthe- 
sized. Aii genes were cioned into the Strep-tag fusion vector pASK-iBA3pius. 
Overexpression and purification of the Pn proteins by affinity chromatography 
on Strep-Tactin coiumns were described previousiy (Heinrich et ai., 2004). 
Biochemicai properties of the purified CrPn and CrPn-AC proteins are shown 
in Figure S3. 

Cloning, Expression, and Purification of NAGK Proteins 

Recombinant AfNAGK was overexpressed and purified as described previ- 
ousiy (Beez et ai., 2009). For recombinant CrNAGK, a gene corresponding to 
mature chioroplast-iocaiized CrNAGK was commerciaiiy synthesized, cioned 
into vector pETISb, and purified as recombinant AfNAGK. Anaiysis of purified 
CrNAGK and AfNAGK by gel filtration is shown in Figure S2. A multiple 
sequence alignment of plant and cyanobacterial NAGK proteins, including 
predicted chloroplast transit peptides for CrNAGK and AfNAGK, is shown in 
Figure SI. 

Surface Plasmon Resonance Analysis 

SPR experiments were performed using a Biacore X biosensor system 
at 25°C in HEPES-buffered saline (HBS buffer) as described previously 
(Maheswaran et ai., 2004). Hise-NAGK was immobilized on flow cell (FC) 
2 to a density of approximately 3,000 resonance units (RUs). The analyte 
solutions contained Pn proteins (100 nM) in HBS buffer and were incubated 
with various effector molecules on ice for 5 min and then injected (50 |al) 
into FC1 (control) and FC2 of the sensor chip. The specific binding of Pn 
to NAGK was recorded as the response signal difference FC2 - FC1 
(ARU). 

Isothermal Titration Caiorimetry 

ITC experiments were performed on a MicroCal VP-ITC instrument in 
10 mM potassium phosphate (pH 7.5), 100 mM NaCI, and 2 mM MgCl 2 
at 20°C. 

Enzymatic Assay for NAGK Activity 

A coupled enzyme assay was used to determine NAGK activity in which the 
production of ADP was coupled to the oxidation of NADH (Beez et ai., 
2009). The reaction mixture contained 2.4 |ig of Pn protein; the reaction was 
started by adding 3 |ag of NAGK. Means of triplicate experimental determina- 
tions are shown. From the velocity slopes, the catalytic constants were calcu- 
lated using the GraphPad Prism 6.01 software program. 

Crystallization, Sample Preparation, Data Coiiection, and Structure 
Determination 

Crystallization trials were performed in a standard vapor-diffusion setup. Crys- 
tals of the CrP\\ apo form grew in a solution containing 3 mg/ml CrPw in 10 mM 
Tris (pH 7.5), 100 mM NaCI, 2 mM ATP, 2 mM MgCl 2 , 5 mM Gin, and 10% (v/v) 
glycerol and a reservoir solution containing 0.15 M (NH 4 ) 2 S 04 , 0.1 M HEPES 
(pH 7.0), and 20% (w/v) PEG 4000. For the preparation of the CrPn*Mg- 
ATP-2-OG complex, these crystals were soaked in a droplet of reservoir solu- 
tion supplemented with 10 mM ATP, 10 mM MgCl 2 , 10 mM 2-OG, and 20% 
(v/v) glycerol. Crystals of the CrPn-AfNACK complex grew in a solution con- 



taining 2 mg/ml protein (with a 1:2 molar ratio of C/'Pn:AfNAGK) in 10 mM 
Tris (pH 7.8), 100 mM NaCI, 40 mM Arg, 10 mM Gin, 2 mM MgCl 2 , 2 mM 
ADP, 10 mM NAG, and 5% (v/v) glycerol and a reservoir solution containing 
200 mM NaCI, 100 mM Na/K phosphate (pH 6.2), and 50% (v/v) PEG 200. 
Data were collected under cryogenic conditions. 

The CrP\\ structure was solved by molecular replacement using the SePn 
structure 2XZW. The CrPn-AfNAGK complex structure was solved on the basis 
of the AfP||-AfNAGK structure 2RD5. Data processing and refinement statis- 
tics are summarized in Table SI . 
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SUMMARY 

Ribosomes elongate at a nonuniform rate during 
translation. Theoretical models and experiments 
disagree on the in vivo determinants of elongation 
rate and the mechanism by which elongation rate 
affects protein levels. To resolve this conflict, we 
measured transcriptome-wide ribosome occupancy 
under multiple conditions and used it to formulate a 
whole-cell model of translation in E. coli. Our model 
predicts that elongation rates at most codons during 
nutrient-rich growth are not limited by the intracel- 
lular concentrations of aminoacyl-tRNAs. However, 
elongation pausing during starvation for single amino 
acids is highly sensitive to the kinetics of tRNA ami- 
noacylation. We further show that translation abor- 
tion upon pausing accounts for the observed ribo- 
some occupancy along mRNAs during starvation. 
Abortion reduces global protein synthesis, but it 
enhances the translation of a subset of mRNAs. 
These results suggest a regulatory role for aminoacy- 
lation and abortion during stress, and our study pro- 
vides an experimentally constrained framework for 
modeling translation. 

INTRODUCTION 

Protein synthesis begins with initiation by ribosomes on an 
mRNA and is followed by a sequence of elongation steps dur- 
ing which amino acids are added to the growing polypeptide 
chain. Initiation is the rate-limiting step for the translation of 
most mRNAs during nutrient-rich growth (Jacques and Dreyfus, 
1990). However, stressful perturbations such as amino acid 
starvation or transgene overexpression can decrease the elon- 
gation rate of ribosomes and affect protein levels (Subrama- 
niam et al., 2013a; Varenne et al., 1984; Welch et al., 2009). 
Biophysical modeling of translation can be used to infer the 
quantitative effect of these stressful perturbations on initiation 
rate, elongation rate, and the expression level of proteins 
(Shah et al., 2013). More generally, modeling integrates our 
biochemical knowledge of translation and thus enables identifi- 
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cation of novel regulatory processes when incorporation of 
known mechanisms is insufficient to recapitulate experimental 
measurements. 

Biophysical modeling of translation has been greatly aided by 
the development of ribosome profiling, which involves deep 
sequencing of ribosome-protected mRNA fragments to measure 
ribosome occupancy on mRNAs (Ingolia et al., 2009). Ribosome 
profiling has also produced surprising results that challenge 
two central assumptions in current theoretical models of trans- 
lation. First, theoretical models assume that elongation rates 
at codons are directly proportional to the intracellular concentra- 
tion of cognate aminoacyl-tRNAs (aa-tRNAs) during nutrient-rich 
growth (Shah et al., 2013; Tuller et al., 2010). However, this 
assumption is not supported by ribosome profiling— although 
concentrations of different tRNAs vary over greater than a 10- 
fold range in bacteria and yeast (Dong et al., 1996; Tuller et al., 
2010), the measured ribosome occupancy at codons varies 
less than 2-fold in these organisms during nutrient-rich growth 
(Li et al., 2012; Qian et al., 2012). Differential aminoacylation of 
tRNAs is also unlikely to underlie this discrepancy, as most 
tRNA species are >70% aminoacylated during nutrient-rich 
growth (Yegian et al., 1966). Second, most models assume 
that a decrease in ribosome elongation rate on an mRNA affects 
the level of the corresponding protein by causing a traffic jam of 
trailing ribosomes (Shah et al., 201 3; Zhang et al., 1 994). Howev- 
er, except for a few atypical mRNAs (Guydosh and Green, 2014; 
Li et al., 201 2), traffic jams at ribosome pause sites have not been 
observed in vivo, and the effect of ribosome traffic jams on pro- 
tein level remains unclear. Together, these results suggest that 
current theoretical models do not include the full set of mecha- 
nistic ingredients that is necessary for accurate modeling of 
in vivo translation, especially as it pertains to the elongation 
stage. 

To identify mechanistic features of the elongation stage 
that enable accurate biophysical modeling of translation, we 
combined transcriptome-scale and reporter-based experi- 
ments with whole-cell computational modeling. This approach 
enabled us to decipher the contribution of various molecular 
processes to the elongation rate of ribosomes and the synthe- 
sis rate of proteins. We found that the differences in the intra- 
cellular concentration of tRNAs and the occurrence of ribosome 
traffic jams at pause sites, both of which have been key ingre- 
dients in previous theoretical models (Shah et al., 2013; Tuller 

CrossMark 





Cell 




— A. /v<^v^yV| 



0 100 200 300 400 500 600 700 

Distance from Start Codon (nt) 

^B“Leu I l -Ser I I Rich 
-Leu : A CUA A CUU A cue A UUG A UUA A CUG 
-Ser : A UCG A UCC A UCA A UCU A AGU A AGC 



CD 



4 H 



o 
o 
O 

0 0 

ii- 

ii’ 

|D^0 

o 



c 3 
0 



• CUA 

• CUU 

•cue 



UUA. • 

CUGt-, 

UUG9 



04 

CO 

I 

c 3 
0 
0 

0 

1 

j0 

0 

^0 



UCG^ 

•UCU 



•AGU 



AGC. 



Codon Ribosome Occupancy 
Relative to Mean, Rich 



Figure 1. Changes in Ribosome Occupancy upon Starvation for 
Single Amino Acids 

(A) Measured ribosome occupancy along three E. coli genes: leuL, IptA, and 
fabi during leucine starvation (-Leu), serine starvation (-Ser), and amino-acid- 
rich growth (Rich). The horizontal axis extends from the start codon to the stop 
codon for each gene. Triangles indicate the positions of leucine (serine) co- 
dons along the coding sequence in the leucine (serine) starvation case. 

(B and C) Measured ribosome occupancy at the 61 sense codons averaged 
across the transcriptome. Start and stop codons are not shown. Standard 
errors of mean are smaller than data markers. 

See also Figure SI . 



et al., 2010), are insufficient to predict the measured ribosome 
occupancy in our experiments. We find that the molecular pro- 
cess of aminoacylation has a critical role in protein synthesis 



through modulation of ribosome elongation rates during stress. 
Further, we find that premature termination of translation before 
the synthesis of full-length proteins (henceforth referred to as 
translation abortion) determines both the ribosome occupancy 
along mRNAs and protein expression during stress. More 
generally, our work illustrates the usefulness of integrating 
deep-sequencing experimental methods such as ribosome 
profiling together with quantitative whole-cell modeling to 
reveal the experimentally relevant regimes of the a priori large 
parameter space in biophysical models of complex cellular 
processes. 

RESULTS 

Changes in Ribosome Occupancy upon Starvation for 
Single Amino Acids 

We previously found that, during starvation for single amino 
acids in E. coli, the presence of certain codons cognate to the 
limiting amino acid can decrease the protein synthesis rate by 
up to 100-fold (Subramaniam et al., 2013a). Starvation for single 
amino acids decreases the concentration of the cognate amino- 
acyl (aa) tRNAs (Dittmar et al., 2005; Sorensen, 2001 ; Sorensen 
et al., 2005), thus decreasing the elongation rate of ribosomes at 
cognate codons. Based on these observations, we used single 
amino acid starvation as an experimental condition to develop 
constraints for biophysical models of protein synthesis in an 
elongation-limited regime of translation. 

To characterize the effect of amino acid starvation on trans- 
lation, we performed ribosome profiling in E. coli after 30 min of 
starvation for each of two amino acids: leucine and serine. For 
comparison with an initiation-limited regime of translation, we 
also performed ribosome profiling on cells grown in rich- 
defined medium with all 20 amino acids. The distribution of 
ribosome footprints along mRNAs was highly variable during 
nutrient-rich growth (Figure 1 A), consistent with earlier observa- 
tions (Oh et al., 2011). Starvation for either leucine or serine 
caused a pronounced change in the distribution of ribosome 
footprints along individual mRNAs (Figure 1A). Consistent with 
previous observations (Li et al., 2012), the transcriptome-aver- 
aged ribosome occupancy differed less than 2-fold across the 
61 codons during nutrient-rich growth (Figures 1B and 1C, hor- 
izontal axis) and did not systematically vary with tRNA abun- 
dance (Figure S1A available online). Upon leucine or serine 
starvation, the average ribosome occupancy increased at 
leucine and serine codons, respectively, but this increase was 
not uniform (Figures 1B and 1C, vertical axis). The three leucine 
codons CUA, CUC, and CUU had 2.5- to 4-fold higher ribo- 
some occupancy than the average during leucine starvation, 
whereas the ribosome occupancy at the remaining three 
leucine codons CUG, UUA, and UUG was comparable to that 
of noncognate codons. Similarly, ribosome occupancy at the 
four serine codons UCA, UCC, UCG, and UCU was markedly 
higher than at the two serine codons AGC and AGU during 
serine starvation. We attributed the increase in ribosome occu- 
pancy at leucine and serine codons to a decrease in the ribo- 
some elongation rate, as we did not observe an increase in to- 
tal mRNA density at these codons upon leucine or serine 
starvation (Figures S1B and S1C). 
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Figure 2. A Transcriptome-Scale Biophysical Model of Translation 

(A) Schematic of the four cellular processes modeled (initiation, elongation, 
aminoacylation, abortion) and the molecular species considered in the bio- 
physical model. 

(B) Reaction rates, f?„ for the cellular processes shown in (A). The superscripts 
following R, refer to the abbreviations for the cellular processes in (A). 
Subscript indices are used for distinct molecular species of the same kind 
(mRNA - p, tRNA - /, codon - a). The intracellular concentrations of molecular 
species and the values for rate constants in our whole-cell simulation are in 
Table S1. 

See also Data S1 . 



A Transcriptome-Scale Biophysical Model of 
Translation in E. co// 

To systematically evaluate the consistency of different bio- 
physical models with our measurements of ribosome occu- 
pancy, we formulated a generalized kinetic model of protein 
synthesis in E. coli that accounted for four different molecular 
processes that we found to play a critical role in determining 
ribosome occupancy and protein expression: initiation, elon- 
gation, aminoacylation, and abortion (Figure 2A). Other molec- 
ular processes that are not expected to limit protein synthesis 
rate under the conditions of our study, such as termination 
at stop codons and ribosome recycling, were assumed to 
be instantaneous (rates set to infinity) for the sake of 
simplicity. 

The core features of our biophysical model of protein synthesis 
are summarized by the kinetic rate equations for the four molec- 
ular processes (Figure 2B). The full list of parameter values used 
in our biophysical model is given in Table S1. We implemented 
our model as a continuous-time, discrete-state process at the 



whole-cell level for E. coli by adapting the source code from a 
recent computational study in yeast (Shah et al., 2013). During 
balanced nutrient-rich growth, the availability of free ribosomes 
for initiation is rate limiting for protein synthesis (Vind et al., 
1993). We modeled this empirical observation by taking the initi- 
ation rate R"^ to be proportional to / the number of free ribo- 
somes in the cell (Equation 1 in Figure 2B). The initiation rate 
constant kp for each mRNA species p was estimated from our 
ribosome profiling measurements and the macromolecular 
composition and synthesis rates that have been measured with 
high accuracy in E. coli (Bremer and Dennis, 2008) (Extended 
Experimental Procedures). 

Intraribosomal Events Limit the Rate of Elongation 
during Nutrient-Rich Growth 

The elongation cycle is a multistep process that begins with the 
delivery of the aa-tRNA (in ternary complex with Ef-Tu and GTP) 
to the ribosome. This step is followed by a series of intrariboso- 
mal events that include kinetic proofreading, peptidyl transfer, 
and translocation of the ribosome to the next codon (Winter- 
meyer et al., 2004). We considered a minimal model of elonga- 
tion (Elf et al., 2003) that is composed of two effective steps 
(Equation 2 in Figure 2B). The first step accounts for the arrival 
of cognate aa-tRNAs at the ribosome, and its rate is proportional 
to the intracellular concentration of aa-tRNA, The proportion- 
ality factor / is the second-order rate constant (kcat^KM) for as- 
sociation between the ribosome and the aa-tRNA. The second 
step accounts for all intraribosomal events that follow the arrival 
of the cognate aa-tRNA at the ribosome, and the cumulative rate 
of these events does not depend on the intracellular concentra- 
tion of aa-tRNA. For simplicity, we set the total rate of this sec- 
ond step to a uniform value 1/tq for all codons. Finally, when 
translocation is prevented by the presence of a leading ribosome 
(henceforth referred to as jamming), the elongation rate is set to 
zero in our model. 

If To (the rate constant describing all intraribosomal events 
following arrival of the aa-tRNA) has the same value for all co- 
dons, the experimental observation that ribosome occupancies 
at codons do not vary inversely with the corresponding tRNA 
concentrations during nutrient-rich growth (Figure SI A) now im- 
plies a straightforward mechanistic constraint in our model— that 
the rate of intraribosomal events limits the overall rate of ribo- 
some elongation under these conditions, i.e., tq » 1//n^^ in 
Equation 2 of Figure 2B. Together, with the measured average 
elongation rate of ribosomes of ~20 codons per second (Bremer 
and Dennis, 2008), we can then infer that tq ~ 0.05 s. In the more 
general case in which tq is not uniform for all codons, most of the 
codons still need to have tq » Mk^'n^^ to recapitulate the 
observed lack of (inverse) proportionality between ribosome 
occupancy and tRNA concentration. Our conclusion that tq » 
Mk^'n^*, based solely on ribosome occupancy measurements, 
is also consistent with the typical in vitro value of / = 2 x 
10^ M“^s“^ (Bilgin and Ehrenberg, 1994; Pavlov and Ehrenberg, 
1996) and the in vivo concentrations of tRNAs (r?®^ during 
nutrient-rich growth (Bremer and Dennis, 2008; Dong et al., 
1996) (Data SI). Using these values of / and the median 
value of for the 61 sense codons is 0.0036 s, and all 

codons had Mk^'n^^ < 0.013 s (< tq -0.05 s). 
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Figure 3. Elongation and Aminoacylation 
Kinetics Determine Ribosome Occupancy 
at Codons 

(A) Mean ribosome occupancy at the 61 sense co- 
dons averaged across the transcriptome during 
nutrient-rich growth caicuiated from whoie-ceii 
modei. Simuiations were run with the time for intra- 
ribosomai events at a singie codon, tq, set to either 
0 s (horizontal or 0.05 s (verticai). The vaiue of was 
chosen such that the mean eiongation rate of ribo- 
somes R^' was approximateiy equal to the experi- 
mentally measured value of 20 s“^ in both cases. 

(B) Amount of transgene proteins produced per 
mRNA upon overexpression during nutrient-rich 
growth calculated from whole-cell model as a 
function of codon adaptation index (CAI) and the 
transgene fraction. All data points corresponding 
to a single transgene mRNA fraction were 
normalized by the data point at CAI = 0.9. 

(C and D) Mean ribosome occupancy at the six 
leucine codons as a function of leucylation rate 
constant calculated from whole-cell model. The 
leucylation rate constants of the five leucine tRNA 
isoacceptors were set either equal (C) or different 
(D). In the differential case (D), the leucylation rate 
constants were in the proportion 1 .5: 0.5: 1 : 0.5: 
0.5 (Leui through Leu5). 

See also Figure S2. 
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To test the consistency of our whole-cell model with ribosome 
profiling measurements, we simulated our model with the above- 
constrained rate constants for elongation. With tq = 0.05 s and 



k^' = 2 X 10^ M- 



we observed less than 2-fold variation 



(SD/mean = 10%) in ribosome occupancy across the 61 codons 
(Figure 3A, vertical axis). By contrast, when we set the timescale 
of intraribosomal events to be zero (tq = 0 s), as assumed in pre- 
vious work (Shah et al., 201 3; Tuller et al., 201 0), ribosome occu- 
pancy varied 34-fold (SD/mean = 105%) across the 61 codons 
(Figure 3A, horizontal axis), even though for both values of tq 
( 0.05 and 0 s), the average elongation rate (R^) was ~20 codons 
s“^ . Thus, we conclude that ribosome occupancy at codons dur- 
ing nutrient-rich growth of E. coli, as measured using ribosome 
profiling, is consistent with a model in which intraribosomal 
events, rather than the arrival of aa-tRNA to the ribosome A site, 
are the slowest steps in the elongation cycle for most codons. 

Highly expressed genes in E. coli, such as those coding for the 
translation machinery, display a characteristic codon bias (called 
the major codon bias) toward codons that are decoded by tRNA 
isoacceptors with high intracellular concentration (Andersson 
and Kurland, 1990). Motivated by this observation, biophysical 
models often predict that optimizing the codon usage of an overex- 
pressed transgene by enriching for codons decoded by abundant 
tRNA isoacceptors can significantly improve the yield of its protein 
(Shah et al., 2013; Zhang et al., 1994). Because this question is of 



significant interest in biotechnological ap- 
plications (Gustafsson et al., 2004), we re- 
examined it in the light of our conclusion 
that intraribosomal events, rather than the 
arrival of aa-tRNA, limit the rate of elonga- 
tion during nutrient-rich growth. 

We simulated the expression of three transgenes at different 
fractions of the cellular transcriptome and with various codon 
bias (Extended Experimental Procedures and Data SI). When 
we assumed that the elongation rate at codons is proportional 
to the concentration of aa-tRNAs (by setting tq = 0 s in Equation 
2 of Figure 2B), we found that increasing codon bias can improve 
protein expression from the transgenes by >2-fold (Figure 3B, 
open markers). However, this effect of codon bias was essen- 
tially absent when elongation rates at codons were limited by in- 
traribosomal events (Figure 3B, filled markers). We observed a 
similar effect of intraribosomal events when we simulated a 
model for yeast translation (Shah et al., 2013) (Figure S2A). The 
lack of effect of codon bias on transgene overexpression is 
consistent with measurements using synthetic gene libraries in 
E. coli, which detected little correlation between yield of overex- 
pressed proteins and the codon adaptation index (Kudia et al., 
2009; Welch et al., 2009). Thus, based on the elongation kinetics 
inferred from ribosome profiling, we suggest that the major 
codon bias is not a significant determinant of protein yield during 
transgene overexpression in E. coli. 

Differential Aminoacylation Determines the Specificity 
of Elongation Pausing 

Having formulated our biophysical model during nutrient-rich 
growth when initiation is rate limiting for translation, we sought 
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Figure 4. Ribosome Traffic Jams at Ribosome Pause Sites 

(A) Measured monosome occupancy from -120 nt to +120 nt around the six 
leucine codons during leucine starvation. The monosome occupancy was 
averaged across all occurrences of each codon in the transcriptome. 

(B) Measured disome occupancy from -120 nt to +120 nt around the six 
leucine codons during leucine starvation. The disome occupancy was aver- 
aged across all occurrences of each codon in the transcriptome. 

(C) Nuclease footprinting assay for detecting ribosome traffic jams on yfp re- 
porter mRNAs. The blue vertical bar along the first variant indicates the loca- 
tion of the CTG200 > CTA substitution. Northern blotting was performed using 
a ^^P-labeled antisense RNA complementary to the 300 nt mRNA region from 
-250 nt to +50 nt of the CTG200 > CTA substitution. 

(D) (Top) Northern blot of nuclease-digested polysomes for the three yfp 
variants. (Bottom) Polyacrylamide gel corresponding to the northern blot. 
Numbers above individual lanes correspond to the three yfp variants in 
(C). The size markers on the left of the northern blot were inferred by 
aligning it to the polyacrylamide gel image. The arrows at 30, 60, and 90 nt 
indicate the approximate locations of monosomes, disomes, and trisomes, 
respectively. 

See also Figure S3. 



to test our model in an elongation-limited regime during starvation 
for single amino acids. To simulate starvation for a single amino 
acid, we reduced the aminoacylation rate constant of the corre- 
sponding tRNA isoacceptors in Equation 3 in Figure 2B) while 
keeping all other parameters identical to those during nutrient- 
rich growth. Reducing the leucylation or the serylation rate con- 
stant caused a nonuniform increase in ribosome occupancy at 
leucine and serine codons, respectively (Figures 3C and S2B). 
However, the hierarchy predicted by the model significantly devi- 



ated from the measured hierarchy among these codons during 
both leucine and serine starvation (Figures 1B and 1C). 

The discrepancy between model and experiments in the hier- 
archy of ribosome occupancy at codons during starvation could 
arise from two sources. One possibility is that the uniform rate 
constants for codon-tRNA interaction (k^) could differ between 
tRNA isoacceptors, as we used only average estimates in our 
model. However, when we replaced the average estimates with 
in vitro measurements of If', which vary over a 3-fold range for 
the leucine family (Sorensen et al., 2005), it had little effect on 
the predicted hierarchy of ribosome occupancy at codons (Fig- 
ure S2C). A second explanation for the above discrepancy is 
that the rate constants for aminoacylation (If^) might be unequal 
between the different tRNA isoaccceptors. Indeed, when we al- 
lowed the aminoacylation rate constants for tRNA isoacceptors 
to differ over a 3-fold range (Figure S2D), the model could largely 
recapitulate the experimentally measured hierarchy among co- 
dons during starvation (Figures 3D and S2E). Our modeling of dif- 
ferential aminoacylation kinetics is consistent with previous 
in vitro measurements, which found that tRNA isoacceptors 
could differ 2- to 20-fold in their aminoacylation rate constants 
(Fender et al., 2004; Harris and Marashi, 1980; Myers etal., 1971). 

Accounting for aminoacylation itself has a critical effect on 
protein expression during amino acid starvation. A recent 
modeling study arrived at the conclusion that protein expression 
can be rescued during amino acid starvation by reducing the 
number of ribosomes in the cell (Shah et al., 2013). This conclu- 
sion was based on lowering the total number of tRNAs to simu- 
late amino acid starvation without considering aminoacylation 
(Figure S2F). However, there is little evidence that the total con- 
centration of tRNAs significantly changes upon amino acid star- 
vation, whereas the canonical effect of amino acid starvation is 
the reduction in aminoacyl-tRNA concentration due to lower 
aminoacylation rate (Dittmar et al., 2005; Sorensen, 2001 ; Soren- 
sen et al., 2005). When we simulated starvation by reducing the 
levels of aminoacyl-tRNA rather than that of total tRNA using the 
model of Shah et al., (201 3), we did not observe rescue of protein 
expression upon decreasing the number of ribosomes (Fig- 
ure S2G). Thus, accounting for aminoacylation qualitatively al- 
ters the prediction from biophysical models of translation in the 
elongation-limited regime of amino acid starvation. 

Ribosome Traffic Jams at Ribosome Pause Sites during 
Amino Acid Starvation 

Single mRNAs are often simultaneously translated by several ri- 
bosomes. As a result, if ribosomes pause for a sufficiently long 
duration during elongation, a traffic jam of trailing ribosomes 
can occur in the 5' region of the ribosome pause site. To detect 
traffic jams in our ribosome profiling measurement, we calcu- 
lated the average ribosome occupancy across the transcriptome 
in the 120 nt region on either side of the leucine codons during 
leucine starvation (Figure 4A). We observed an increase in ribo- 
some occupancy at three of the leucine codons— CUA, CUC, 
and CUD— which is consistent with ribosomes pausing at these 
codons during elongation. In addition, we observed smaller 
peaks in ribosome density centered approximately -28 nt and 
-56 nt upstream of these three codons (Figure 4A). These peaks 
are consistent with a traffic jam of one and two ribosomes behind 
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Figure 5. Translation Abortion Determines 
the Distribution of Ribosomes along mRNAs 
during Amino Acid Starvation 

(A) Measured ribosome occupancy along mRNAs 
averaged across the transcriptome (1 ,518 genes). 

(B) Ribosome occupancy along mRNAs averaged 
across the transcriptome (1 ,518 genes) calculated 
from the whole-cell model. The abortion rate 
constant was varied. Leucine starvation was 
modeled as a constant 100-fold reduction in the 
leucylation rate constant, 

(C) Codon frequency of the three leucine codons— 
CTA, CTC, and CTT— in three sets of genes (red, 
green, blue) with different intragenic distributions 
of these codons. The number of genes in each 
class is shown between parentheses in the legend. 
The codon frequency distribution was smoothed 
using a Gaussian window of 30 nt width. 

(D) Ribosome occupancy averaged across the 
three sets of genes during leucine starvation 
calculated from the whole-cell model. 

(E) Measured ribosome occupancy averaged 
across the three sets of genes during leucine 
starvation. 

Ribosome occupancy profiles in all panels were 
smoothed using a sliding window of 30 nt. Each 
ribosome occupancy profile was normalized to 
have a mean value of 1 . See also Figure S4. 



the paused ribosome. We observed similar but smaller peaks at 
-28 nt for four of the six serine codons during serine starvation 
(Figure S3A). Notably, we did not observe smaller peaks up- 
stream of Shine-Dalgarno-like codons, which also cause an in- 
crease in ribosome occupancy during nutrient-rich growth (Li 
et al., 2012) (Figure S3B). 

Previous work has shown that nuclease treatment of poly- 
somes with ribosome traffic jams can result in longer mRNA 
footprints that are protected by multiple ribosomes (Guydosh 
and Green, 2014; Wolin and Walter, 1988). To test whether 
longer mRNA footprints occur during amino acid starvation in 
E. coli, we measured the size of nuclease-protected mRNA 
fragments using three yellow fluorescent protein (yfp) reporter 
variants (Figure 4C). We observed a ~30 nt fragment corre- 
sponding to the monosome for all yfp variants both during rich 
growth and during leucine starvation (Figure 4D). In addition, 
we observed a prominent ~60 nt fragment and a weaker ~90 
nt fragment in the yfp variant with the CUA codon during leucine 
starvation. These longer fragments were either faint or absent in 
the control variants without CUA codons and during nutrient- 
rich growth, which is consistent with a traffic jam of multiple 
ribosomes caused by pausing of ribosomes at the CUA codon 
during leucine starvation. These results also suggest that 
the standard ribosome profiling method, in which only mono- 
some-protected fragments are sequenced, underestimates 
the actual in vivo magnitude of ribosome occupancy 5' to the 
pause site in the presence of ribosome traffic jams. Deep 
sequencing of longer mRNA fragments (50-80 nt) that were pro- 
tected by two ribosomes (disomes) from nuclease digestion re- 
vealed an increase in average disome occupancy up to 90 nt 



upstream of CUA codons during leucine starvation (Figure 4B), 
which is consistent with a traffic jam of between two to three 
ribosomes behind the paused ribosome. The length of this re- 
gion with increased footprint density was similar to that ob- 
tained using the standard ribosome profiling method in which 
only short (~30 nt) monosome-protected mRNA fragments 
were sequenced (Figure 4A), which suggests that both the 
paused, leading ribosome as well as the jammed, trailing ribo- 
somes have an equal likelihood of occurring either as mono- 
somes or as disomes upon nuclease treatment. 

Translation Abortion Determines the Distribution of 
Ribosomes along mRNAs during Amino Acid Starvation 

The strong ribosome pausing that we observed during starvation 
for single amino acids enabled us to test whether the measured 
ribosome occupancy along mRNAs is quantitatively consistent 
with a biophysical model of ribosome traffic jams. During 
nutrient-rich growth, except for a peak at the start codon, ribo- 
some occupancy was uniformly distributed across the entire 
length of the mRNA (Figure 5A, gray line), which is similar to pre- 
vious observations (Oh et al., 2011). By contrast, during starva- 
tion for leucine, the distribution of ribosome occupancy was 
highly nonuniform, varying over a 10-fold range within the first 
900 nt (Figure 5A, black line). Ribosome occupancy increased 
sharply to a maximum at around 30 nt from the start codon 
and then decreased toward the 3' end of the mRNA. Starvation 
for serine produced a similar but less skewed distribution of ribo- 
some occupancy along mRNAs (Figure S4A). By comparison, 
the distribution of total mRNA density was not significantly 
skewed during leucine starvation (Figure S4B). 
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We then used the distribution of measured ribosome occu- 
pancy along mRNAs to evaluate our biophysical model of trans- 
lation. Our model reproduced the uniform distribution of 
measured ribosome occupancy during nutrient-rich growth (Fig- 
ure 5B, gray line), which is consistent with initiation being rate 
limiting for translation under these conditions (Li et al., 2014). 
However, when we simulated leucine starvation by a 100-fold 
reduction in the leucylation rate constant, a model that included 
only ribosome traffic jams predicted a more uniform distribution 
of ribosome occupancy than the one observed experimentally 
(Figure 5B, blue line, compared to black line in Figure 5A). This 
disagreement between the model and our measurements could 
not be offset by varying the leucylation rate, the only remaining 
free parameter in our model, over a 1 ,000-fold range (Figure S4C) 
and was also observed for serine starvation (Figure S4D). 

The discrepancy in the distribution of ribosome occupancy 
between ribosome profiling measurements and our model that 
included only ribosome traffic jams led us to consider other mo- 
lecular processes that occur in response to ribosome pausing. 
Translation abortion is a widespread mechanism of ribosome 
rescue that has been observed in both bacteria and eukaryotes 
in response to ribosome pausing (Shoemaker and Green, 2012). 
Inactivation of translation abortion factors can decrease the 
growth rate of cells during stressful perturbations (Keiler and 
Feaga, 2014), indicating the importance of abortion during 
stress. We modeled translation abortion (Equation 4 in Figure 2B) 
as occurring at a constant rate whenever the elongation rate 
of ribosomes falls below a threshold, R^'threshoid- This threshold 
elongation rate was set below the elongation rate of all codons 
under nutrient-rich growth to account for the experimental 
observation that abortion is generally selective for paused ribo- 
somes (Moore and Sauer, 2005). In our model, increasing the 
rate of translation abortion (/c^^) produced higher ribosome occu- 
pancy at the 5' end of mRNAs during starvation (Figures 5B and 
S4D). An in vivo abortion rate (/c^^) of ~0.1 s“^ in our model pre- 
dicted an average distribution of ribosome occupancy that 
matched the experimentally observed distribution during both 
leucine and serine starvation (Figures 5B and S4D, black lines). 

To test further the role of translation abortion at ribosome 
pause sites in determining the distribution of ribosome occu- 
pancy along mRNAs, we analyzed three subsets of E. coli genes 
with varying codon usage patterns (Figure 5C). These gene sets 
had different intragenic locations of the three leucine codons 
CUA, cue, and CUU that result in ribosome pausing during 
leucine starvation (Figure 1 B). These three codons were present 
at a high frequency within the first 300 nt in the first set (Figure 5C, 
red, 323 genes), between 300 and 600 nt in the second set (Fig- 
ure 5C, green, 70 genes), and at a low frequency throughout the 
third set (Figure 5C, blue, 337 genes). The whole-cell model 
incorporating abortion predicted a decreased ribosome occu- 
pancy following the mRNA region where the CUA, CUC, and 
CUU codons were present at high frequency in these gene 
sets (Figure 5D). This prediction matched the measured distribu- 
tion of ribosome occupancy for all three of the gene sets (Fig- 
ure 5E). By contrast, a model that incorporated only ribosome 
traffic jams did not recapitulate the measured distribution of ribo- 
some occupancy in the three gene sets (Figure S4E). Together, 
these observations indicate that incorporating abortion at ribo- 



some pause sites in our whole-cell model of translation is neces- 
sary for quantitative consistency with the measured ribosome 
occupancy along mRNAs during starvation for single amino 
acids in E coli. 

Translation Abortion and Its Effectors during Amino Acid 
Starvation 

Based on the prediction of translation abortion from our analysis 
of the measured ribosome occupancy, we looked for abortion 
events at ribosome pause sites using a yfp reporter system. 
We constructed synonymous variants of yfp that encoded a 
3xFLAG epitope at the N terminus for detection of incomplete 
polypeptides and that had a single leucine starvation-sensitive 
CTA codon at one of three different locations along 3xflag-yfp 
(Figure 6A). During leucine starvation, we detected shorter- 
length polypeptide fragments whose size was consistent with 
premature abortion at the CUA codon (Figure 6B). 

Paused ribosomes with an empty A site are stable in vitro (Iva- 
nova et al., 2005), which suggests a role for a frans-acting factor 
in mediating abortion at ribosome pause sites during leucine 
starvation. Hence, we tested the role of four known abortion- 
mediating factors in E. co//— tmRNA (Keiler et al., 1996), RF3 
(Zaher and Green, 2011), ArfA (Chadani et al., 2010), and ArfB 
(Chadani et al., 201 1)— by deleting the genes for each of these 
factors individually (Figure 6C). Deletion of the gene encoding 
tmRNA, a chimeric transfer-messenger RNA that releases 
paused ribosomes and tags the incomplete polypeptide for pro- 
teolysis (Keiler et al., 1996), caused a significant increase in the 
ratio of the incomplete-to-full-length YFP (Figure 6C). Further, 
complementation with a tmRNAHise mutant (Roche and Sauer, 
2001), which adds a modified His6 proteolysis-resistant tag but 
still causes ribosome release, revealed an incomplete YFP poly- 
peptide with a His6 epitope upon immunoprecipitation (Fig- 
ure 6D). Together, these experiments suggest that tmRNA medi- 
ates abortion at ribosome pause sites during leucine starvation. 
This conclusion is consistent with a previously ascribed role for 
tmRNA during amino acid starvation (Garza-Sanchez et al., 
2008; Li et al., 2008). However, in contrast to these studies and 
previous in vitro experiments (Ivanova et al., 2004), we did not 
find evidence for significant mRNA cleavage near the pause 
site during amino acid starvation (Figure S5A). Furthermore, 
the measured total mRNA density around the pause site did 
not indicate appreciable degradation of the region 3' to the 
pause site (Figure S5B). Finally, ribosome profiling in a AtmRNA 
strain showed only a slight decrease in ribosome occupancy at 
the 5' end of mRNAs compared to a tmRNA^ strain during 
leucine starvation (Figure S5C). This observation suggests that 
abortion is still efficient in the absence of tmRNA and is likely 
mediated by the abortion-mediating factor ArfA. Such an auxil- 
iary role for ArfA in the absence of tmRNA is supported by earlier 
studies, which found that both the arfA mRNA and a truncated 
but functional form of the ArfA protein are highly stabilized by 
the deletion of tmRNA (Garza-Sanchez et al., 201 1 ) and that dou- 
ble knockout of arfA and tmRNA is synthetic lethal (Chadani 
et al., 2010). 

We then tested whether ribosome pausing and abortion also 
affects the expression level of an endogenous E. coli protein, 
the sigma factor RpoS, that is highly upregulated at the 
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Figure 6. Translation Abortion and Its Effec- 
tors during Amino Acid Starvation 

(A) Schematic of 3xflag-yfp reporter variants with 
either singie CTG > CTA substitutions (indicated in 
biue) or truncated at one of three iocations. 

(B) Western biot using anti-FLAG antibody for the 
3xflag-yfp variants shown in (A). 

(C) Western biot with anti-FLAG antibody of the 
CTG200 > CTA variant of yfp during ieucine 
starvation in strains with deietion of one of four 
different genes encoding factors that mediate 
transiation abortion {tmRNA, prfC, arfA, arfB). 
“Wiid-type” refers to the parent ieucine auxotroph 
strain. The bottom panei indicates the densito- 
metric ratio of these two bands. 

(D) (Top) immunoprecipitation with anti-FLAG 
antibody of CTG200 > CTA yfp variant expressed 
during ieucine starvation in a AtmRNA strain with a 
inriRNAnise mutant. (Bottom) inriRNAnise activity 
detected with an anti-His6 antibody. 

(E) (Pie charts) Reiative frequency of the six ieucine 
codons across ail coding sequences in the 
genome, in the rpoS wild-type coding sequence, 
and in the rpoS synonymous variant. Four TTA 
codons were replaced by CTA codons in the rpoS 
synonymous variant at the locations indicated by 
thick blue bars. The thin blue and green bars 
correspond to the location of the CTC and CTT 
codons in the rpoS WT and synonymous variant. 
Blue triangle indicates the location of the first 
ribosome pause site encoded by the CTA codon 
during leucine starvation. 

(F) Western blot against the RpoS protein (top) and 
RpoD protein (bottom) during nutrient-rich growth, 
leucine starvation, and glucose starvation. The 
rpoS wild-type coding sequence at the native 
chromosomal locus was either deleted {ArpoS) or 
replaced by the rpoS TTA > CTA synonymous 
variant without additional selection markers. 
Numbers between the two panels indicate the 
normalized densitometric ratio of the RpoS and 
RpoD bands for each lane. 

(G) Western blot with anti-FLAG antibody against 
3xFLAG-YFP-RpoS fusion proteins during leucine 

starvation. Approximate molecular weight in kilodaltons (kD) was estimated using a protein ladder. Blue triangle corresponds to the approximate location of the 
expected truncated peptide caused by ribosome abortion at the first pause site in the rpoS TTA > CTA synonymous variant (indicated as a blue triangle in E). 
See also Figure S5 and Table S2. 
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transcriptional level in response to leucine starvation (Subrama- 
niam et al., 2013a). Ribosome pause sites encoded by the two 
leucine codons CTA and CTC during leucine starvation are un- 
derrepresented in the rpoS protein coding sequence in compar- 
ison to their average frequency in the genome (Figure 6E), which 
suggests that ribosome pause sites during leucine starvation are 
selected against in the rpoS coding sequence due to their 
adverse effect on the expression of this critical stress response 
protein. To test the effect of ribosome pause sites on RpoS 
expression, we substituted, at the chromosomal rpoS locus, 
four leucine TTA codons with the synonymous CTA codons 
that encode ribosome pause sites (Figure 6E, thick blue bars). 
Western blotting indicated that the expression level of RpoS 
decreased ~6-fold during leucine starvation due to the presence 
of the CTA codons in the rpoS coding sequence (Figure 6F). This 
decrease was specific to leucine starvation, as the synonymous 



substitutions did not significantly affect the expression of RpoS 
during glucose starvation, which also strongly upregulates 
RpoS (Figure 6F). By fusing 3xFLAG-YFP to the N terminus of 
RpoS, we detected a truncated peptide (Figure 6G, blue triangle) 
whose size is consistent with translation abortion at the first ribo- 
some pause site in the rpoS synonymous variant during leucine 
starvation (Figure 6E, blue triangle). 

Effect of Translation Abortion on Protein Expression 
during Amino Acid Starvation 

Previous studies have suggested that the primary function of 
abortion mediated by factors such as tmRNA is to rescue inac- 
tive mRNA-bound ribosomes during stress and thereby increase 
the translational capacity in the cell (Moore and Sauer, 2007). To 
quantify this effect of abortion on ribosome rescue and transla- 
tional capacity, we tracked the number of free ribosomes and 
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Figure 7. Effect of Translation Abortion on Protein Expression 

(A) Effect of varying abortion rate constant (/c^^) on the number of free ribosomes in the cell (gray circles) and the global synthesis rate of complete proteins (black 
triangles) during leucine starvation calculated from whole-cell model. The value = 0.1 s“^ that fits the measured ribosome occupancy (black line in Figure 5B) is 
indicated as a dashed line. 

(B) Effect of non-zero abortion rate constant on the synthesis rate of individual E. coli proteins during leucine starvation calculated from the whole-cell model. 

(C) Average frequency of the three leucine codons CTA, CTC, and CTT for genes in each of the histogram bins in (B). Only genes with greater than ten leucine 
codons were considered in (B) and (C). 

See also Figure S6. 



the global synthesis rate of proteins as a function of the abortion 
rate in our whole-cell model (Figures 7A and S6). Based on earlier 
work (Bremer and Dennis, 2008), we assumed that 15% of the 
ribosomes in a cell are free (not bound to mRNAs) during 
nutrient-rich growth. In the absence of abortion (/c^^ = 0 s“^), star- 
vation for leucine decreased the fraction of free ribosomes in the 
cell to 7% (Figure 7A, leftmost circle). Increasing the rate of abor- 
tion during leucine starvation gradually rescued the fraction of 
free ribosomes to a maximum of ~60% in our model (Figure 7A, 
rightmost circle). We then examined the effect of abortion on 
global protein expression using our whole-cell model. In the 
absence of abortion (/c^^ = 0 s“''), leucine starvation decreased 
the synthesis rate of complete proteins in the cell to 35% of its 
value during nutrient-rich growth (Figure 7A, leftmost triangle). 
Increasing the rate of abortion during leucine starvation further 
decreased the global synthesis rate to a minimum of ~1 5% (Fig- 
ure 7A, rightmost triangle). Notably, increasing the abortion rate 
had widely different effects on protein expression from individual 
mRNAs during leucine starvation (Figure 7B). Proteins with 
higher expression level upon increasing the abortion rate also 
had a lower frequency of ribosome pause sites in the corre- 
sponding mRNAs (Figure 7C). As a result, protein expression 
from these mRNAs is less susceptible to abortion, and their 
translation initiation rate is increased by the released ribosomes 
due to abortion from other mRNAs. Thus, our model predicts that 
abortion enhances the translation of mRNAs in a selective 
manner even though it can have a deleterious effect on global 
protein expression during amino acid starvation. 

DISCUSSION 

Here, we formulated a biophysical model of translation in E. coli 
by leveraging the near single-codon resolution of the ribosome 
profiling method. Notably, we constrained our model not only 



in the initiation-limited regime of nutrient-rich growth, but also 
in the regime of amino acid starvation, during which the elonga- 
tion rate of ribosomes has a large effect on the measured ribo- 
some occupancy. By contrast, previous computational models 
of translation used data solely from initiation-limited regimes, 
with the consequence that mechanistic features of the elonga- 
tion stage of translation were not fully constrained (Shah et al., 
2013; Tuller et al., 2010). As our results generally illustrate, the 
exact mechanistic features assumed for the elongation stage 
have a critical role in models of translation, and changing these 
features qualitatively alters several of the model predictions. 

Consistency between ribosome profiling measurements and 
our model suggests that the concentration of aminoacyl-tRNAs 
does not limit the elongation rate of ribosomes at most codons 
during nutrient-rich growth. This conclusion relies on our 
assumption that biases in the generation and analysis of ribo- 
some footprinting data are sufficiently small such that the 
measured ribosome occupancy at codons reflects the residence 
times of ribosomes with an empty A site at these codons. This 
assumption is partly supported by our observation that differ- 
ences in ribosome occupancy at codons during starvation for 
their cognate amino acid are consistent with their corresponding 
effect on protein expression. A more direct test of our conclusion 
will be to characterize the effect of overexpressing low-abun- 
dance tRNA isoacceptors on the measured ribosome occupancy 
during nutrient-rich growth. Our model still predicts a small vari- 
ation in ribosome occupancy at codons that is correlated with 
tRNA abundance, the exact magnitude of which depends on 
the kcat/Ki\A for ribosome-tRNA association (Figure 3A, vertical 
axis). Although we predict that this small variation neither limits 
the ribosome elongation rate at most codons nor affects protein 
levels during nutrient-rich growth on the physiological timescale 
of protein synthesis, it can nevertheless leave an evolutionary 
signature on protein coding sequences and thus underlie the 
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widely observed correlation between tRNA abundance and 
codon frequency in microorganisms (Andersson and Kurland, 
1990; Drummond and Wilke, 2008; Wallace et al., 2013). 

Our analysis of aminoacylation kinetics raises the interesting 
possibility that cells might utilize differential aminoacylation rates 
of tRNA isoacceptors as a mechanism to regulate the elongation 
rate of codons specifically during nutrient stress. Further, 
small differences in aminoacylation rate between tRNA isoac- 
ceptors might also modulate mis-aminocylation rates during 
other stresses (Netzer et al., 2009) while having little or no dele- 
terious effect on translation during nutrient-rich growth when 
aminoacylation is not limiting for elongation. 

Our work provides in vivo evidence for the widespread occur- 
rence of ribosome traffic jams in response to pausing of a leading 
ribosome. However, we found that translation abortion at ribo- 
some pause sites is the primary determinant of ribosome occu- 
pancy along mRNAs during amino acid starvation. Although 
abortion from non-stop mRNAs is thought to increase global pro- 
tein expression during nutrient-rich growth (Moore and Sauer, 
2007), our whole-cell modeling suggests that abortion of paused 
ribosomes during nutrient stress might have the function of 
enabling selective translation from specific mRNAs. Additionally, 
releasing nascent polypeptides from paused ribosomes by abor- 
tion might facilitate their proteolysis and prevent protein misfold- 
ing during stress. However, previous studies have found that 
eliminating the ability of the abortion-mediating factor tmRNA 
to target nascent polypeptides to proteolysis does not signifi- 
cantly alter the cellular response to stress (Abo et al., 2002; 
Huang et al., 2000), suggesting that prevention of protein mis- 
folding is not the primary cellular function of abortion-mediating 
factors during stress. 

The four processes that we modeled in our study (Figure 2) are 
an essential part of protein synthesis in both bacteria and 
eukaryotes. Hence, our analysis in E. coli can be readily 
extended to eukaryotes, where ribosome profiling has revealed 
that a variety of stresses result in a slow elongation rate (Liu 
et al., 2013; Shalgi et al., 2013). It will also be useful to integrate 
our whole-cell model of translation with quantitative models of 
other cellular processes such as transcription (Brewster et al., 
2014) and metabolism (Bordbar et al., 2014). Such an integrated 
model can shed light on the complex interplay between meta- 
bolism and gene expression that occurs during environmental 
changes (Subramaniam et al., 2013b). 

EXPERIMENTAL PROCEDURES 

Construction of all strains and plasmids (listed in Table S2), western blots, and 
northern blots were performed using standard molecular biology techniques 
(Extended Experimental Procedures). 

Ribosome Profiling 

Ribosome profiling was carried out as described previously (Li et al., 2012; Oh 
et al., 2011), with the following modifications. To accurately capture the ribo- 
some occupancy on mRNAs with single-codon resolution, we flash froze the 
cells immediately upon harvesting and stabilized ribosomes with the transla- 
tion inhibitor chloramphenicol only at the lysis stage. Cells were lysed using 
glass beads (G1277, Sigma, vortex 10 x 30 s at 4°C with 60 s cooling on ice 
in between). Micrococcal nuclease digestion was carried out with 1 U Wor- 
thington Biochemicals MNase per |xg of nucleic acid, as measured by A 26 o- 
Monosome-protected mRNA footprints between 20 and 40 nt were size 



selected by polyacrylamide gel electrophoresis for monosome sequencing. 
For disome sequencing, the disome peak was collected from the MNase- 
treated polysomes after sucrose-gradient fractionation, and fragments be- 
tween 50 and 80 nt were used for sequencing. For total mRNA sequencing, 
the Microbe Express kit (Ambion) was used for subtracting rRNA from total 
mRNA and then fragmented using a bicarbonate buffer (Ingolia et al., 2009) 
for 20 min. For library construction, polyA-tailing (Ingolia et al., 2009) was 
used instead of linker ligation. 

High-Throughput Sequencing Data Anaiysis 

Analysis steps were similar to that in previous ribosome profiling studies (Li 
et al., 2012; Subramaniam et al., 2013b) and were implemented using Python 
and Bash programming languages. Full programming code for generating the 
final figures in our paper starting from raw sequencing data is provided both 
as an interactive IPython notebook (Perez and Granger, 2007) and as a static 
HTMLfile (sequencing_data_analysis.html in Data SI ). In brief, single-end reads 
were polyAtrimmed and then aligned to the E. co// genome (NC_00091 3.3 build) 
using Bowtie (Langmead et al., 2009). Aligned reads were trimmed by 8 nt on 
each side. Each genomic position corresponding to the trimmed read was as- 
signed ribosome occupancy equal to the inverse of the read length. The tran- 
scriptome-averaged ribosome occupancy at individual codons for each sample 
(Figures 1 B and 1 C) was computed by first averaging the ribosome occupancy 
at the first nucleotide position of the codon across all occurrences of that codon 
within each coding sequence and then by averaging across all coding se- 
quences that had a minimum average occupancy of one read/codon. Because 
the start codon and the three stop codons have a high ribosome density during 
both nutrient-rich growth and starvation, they were excluded in the plots 
showing all 61 sense codons (Figures 1 and SI) for clarity. The ribosome occu- 
pancy profiles around codons (Figures 5A and 5B) were calculated using the 
same procedure but for the 120 nt region on each side of the codon rather 
than just the first nucleotide position of the codon. Ribosome occupancy along 
mRNAs (Figure 6A) was computed by first normalizing the ribosome occupancy 
at each position of a coding sequence by the average ribosome occupancy for 
the full coding sequence, and then by averaging this quantity at each position 
beginning from the start codon across all coding sequences that had a minimum 
average occupancy of one read/codon. 

Whole-Cell Model of E. coli Translation 

Our whole-cell model for translation (Figure 2) was implemented using the 
stochastic Gillespie algorithm (Gillespie, 1977). The source code for imple- 
menting the kinetic model in Figure 2 was adapted from Shah et al., (2013), 
with modifications as shown in Data SI . Full programming code for running 
the simulation and for the subsequent data analysis to generate the final 
figures in our paper is provided both as an interactive IPython notebook 
(Perez and Granger, 2007) and as a static HTML file (simulation.html and sim- 
ulation_data_analysis.html in Data SI). Standard errors of mean for model pre- 
dictions in all figures are smaller than data markers. Our whole-cell model 
tracked the state of 44,000 ribosomes, 408,000 tRNA molecules (38 distinct 
tRNA species), and ~7,500 mRNA molecules (1,518 distinct mRNA species) 
in the E. coli cell (Table SI). The transition rates between different states of 
each of these molecules are determined by the four rate equations in Figure 2B. 
The key parameters that control the predictions from our model are the rate 
constants k for the four processes in Figure 2B. Based on the analysis pre- 
sented in the main text, we chose a default set of parameters for all of our 
simulations. Table SI lists these choices, together with the corresponding ref- 
erences and a footnote explaining the choice (see also Extended Experimental 
Procedures). We chose parameters corresponding to a cell doubling time of 
30 min that we measured during nutrient-rich growth in our experiments. 

ACCESSION NUMBERS 

Sequencing data are publicly available from Gene Expression Omnibus, 
accession number GSE51052. Simulation results and programming code for 
simulation, sequencing data analysis, simulation data analysis and reproduc- 
tion of figures are publicly available at http://datadryad.org/, http://dx.doi.org/ 
10.5061/dryad.ch352. 
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SUMMARY 

Just as reference genome sequences revolutionized 
human genetics, reference maps of interactome 
networks will be critical to fully understand geno- 
type-phenotype relationships. Here, we describe a 
systematic map of ^14,000 high-quality human bi- 
nary protein-protein interactions. At equal quality, 
this map is ^30% larger than what is available from 
small-scale studies published in the literature in the 
last few decades. While currently available informa- 
tion is highly biased and only covers a relatively small 
portion of the proteome, our systematic map appears 
strikingly more homogeneous, revealing a “broader” 
human interactome network than currently appre- 
ciated. The map also uncovers significant inter- 
connectivity between known and candidate cancer 
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gene products, providing unbiased evidence for an 
expanded functional cancer landscape, while demon- 
strating how high-quality interactome models will 
help “connect the dots” of the genomic revolution. 

INTRODUCTION 

Since the release of a high-quality human genome sequence a 
decade ago (International Human Genome Sequencing Con- 
sortium, 2004), our ability to assign genotypes to phenotypes 
has exploded. Genes have been identified for most Mendelian dis- 
orders (Hamosh et al., 2005) and over 100,000 alleles have been 
implicated in at least one disorder (Stenson et al., 201 4). Hundreds 
of susceptibility loci have been uncovered for numerous complex 
traits (Hindorff et al., 2009) and the genomes of a few thousand hu- 
man tumors have been nearly fully sequenced (Chin et al., 201 1). 
This genomic revolution is poised to generate a complete descrip- 
tion of all relevant genotypic variations in the human population. 
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Genomic sequencing will, however, if performed in isolation, 
leave fundamental questions pertaining to genotype-phenotype 
relationships unresolved (Vidal et al., 2011). The causal changes 
that connect genotype to phenotype remain generally unknown, 
especially for complex trait loci and cancer-associated mu- 
tations. Even when identified, it is often unclear how a causal mu- 
tation perturbs the function of the corresponding gene or gene 
product. To “connect the dots” of the genomic revolution, func- 
tions and context must be assigned to large numbers of geno- 
typic changes. 

Complex cellular systems formed by interactions among genes 
and gene products, or interactome networks, appear to underlie 
most cellular functions (Vidal et al., 2011). Thus, a full understand- 
ing of genotype-phenotype relationships in human will require 
mechanistic descriptions of how interactome networks are per- 
turbed as a result of inherited and somatic disease susceptibil- 
ities. This, in turn, will require high-quality and extensive genome 
and proteome-scale maps of macromolecular interactions such 
as protein-protein interactions (PPIs), protein-nucleic acid inter- 
actions, and posttranslational modifiers and their targets. 

First-generation human binary PPI interactome maps (Rual 
et al., 2005; Stelzl et al., 2005) have already provided network- 
based explanations for some genotype-phenotype relation- 
ships, but they remain incomplete and of insufficient quality to 
derive accurate global interpretations (Figure SI A available on- 
line). There is a dire need for empirically-controlled (Venkatesan 
et al., 2009) high-quality proteome-scale interactome reference 
maps, reminiscent of the high-quality reference genome 
sequence that revolutionized human genetics. 

The challenges are manifold. Even considering only one splice 
variant per gene, approximately 20,000 protein-coding genes 
(Kim et al., 2014; Wilhelm et al., 2014) must be handled and 
~200 million protein pairs tested to generate a comprehensive bi- 
nary reference PPI map. Whether such a comprehensive network 
could ever be mapped by the collective efforts of small-scale 
studies remains uncertain. Computational predictions of protein 
interactions can generate information at proteome scale (Zhang 
et al., 2012) but are inherently limited by biases in currently avail- 
able knowledge used to infer such interactome models. Should in- 
teractome maps be generated for all individual human tissues us- 
ing biochemical cocomplex association data, or would “context- 
free” information on direct binary biophysical interaction for all 
possible PPIs be preferable? To what extent would these ap- 
proaches be complementary? Even with nearly complete, high- 
quality reference interactome maps of biophysical interactions, 
how can the biological relevance of each interaction be evaluated 
under physiological conditions? Here, we begin to address these 
questions by generating a proteome-scale map of the human bi- 
nary interactome and comparing it to alternative network maps. 

RESULTS 

Vast Uncharted Interactome Zone in Literature 

To investigate whether small-scale studies described in the liter- 
ature are adequate to qualitatively and comprehensively map the 
human binary PPI network, we assembled all binary pairs identi- 
fied in such studies and available as of 2013 from seven public 
databases (Figure SI B, see Extended Experimental Procedures, 



Section 1). Out of the 33,000 //ferature binary pairs extracted, two 
thirds were reported in only a single publication and detected by 
only a single method (Lit-BS pairs), thus potentially presenting 
higher rates of curation errors than binary pairs supported by mul- 
tiple pieces of evidence (Lit-BM pairs; Tables SI A, SI B, and SI C) 
(Cusick et al., 2009). Testing representative samples from both of 
these sets using the mammalian protein-protein interaction trap 
(MAPPIT) (Eyckerman et al., 2001) and yeast two-hybrid (Y2H) 
(Dreze et al., 2010) assays, we observed that Lit-BS pairs were 
recovered at rates that were only slightly higher than the ran- 
domly selected protein pairs used as negative control (random 
reference set; RRS) and significantly lower than Lit-BM pairs (Fig- 
ure 1 A and Table S2A; see Extended Experimental Procedures, 
Section 2). Lit-BS pairs co-occurred in the literature significantly 
less often than Lit-BM pairs as indicated by STRING literature 
mining scores (Figure 1A and Figure SIC; see Extended Experi- 
mental Procedures, Section 2) (von Mering et al., 2003), suggest- 
ing that these pairs were less thoroughly studied. Therefore, use 
of binary PPI information from public databases should be 
restricted to interactions with multiple pieces of evidence in the 
literature. In 2013, this corresponded to 1 1 ,045 high-quality pro- 
tein pairs (Lit-BM-13), more than an order of magnitude below 
current estimates of the number of PPIs in the full human interac- 
tome (Stumpf et al., 2008; Venkatesan et al., 2009). 

The relatively low number of high-quality binary literature PPIs 
may reflect inspection biases inherent to small-scale studies. 
Some genes such as RB1 are described in hundreds of publi- 
cations while most have been mentioned only in a few (e.g., 
the unannotated C11orf21 gene). To investigate the effect of 
such biases on the current coverage of the human interactome 
network, we organized the interactome search space by ranking 
proteins according to the number of publications in which they 
are mentioned (Figure IB). Interactions between highly studied 
proteins formed a striking “dense zone” in contrast to a large 
sparsely populated zone, or “sparse zone,” involving poorly 
studied proteins. Candidate gene products identified in 
genome-wide association studies (GWAS) or associated with 
Mendelian disorders distribute homogeneously across the pub- 
lication-ranked interactome space (Figure IB and Figure SID), 
demonstrating a need for unbiased systematic PPI mapping to 
cover this uncharted territory. 

A Proteome-wide Binary Interactome Map 

Based on literature-curated information, the human interactome 
appears to be restricted to a narrow dense zone, suggesting that 
half of the human proteome participates only rarely in the inter- 
actome network. Alternatively, the zone that appears sparse in 
the literature could actually be homogeneously populated by 
PPIs that have been overlooked due to sociological or experi- 
mental biases. 

To distinguish between these possibilities and address other 
fundamental questions outlined above, we generated a new pro- 
teome-scale binary interaction map. By acting on all four param- 
eters of our empirically-controlled framework (Venkatesan et al., 
2009), we increased the coverage of the human binary interac- 
tome with respect to our previous human /nteractome data set 
obtained by investigating a search space defined by ~7,000 pro- 
tein-coding genes (“Space /”) and published in 2005 (HI-l-05) 
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Figure 1 . Vast Uncharted Interactome Zone in Literature and Generation of a Systematic Binary Data Set 

(A) Validation of binary literature pairs extracted from public databases (Bader et al., 2003; Berman et al., 2000; Chatr-Aryamontri et al., 2013; Kerrien et al., 2012; 
Licata et al., 2012; Keshava Prasad et al., 2009; Salwinski et al., 2004). Fraction of pairs recovered by MAPPIT at increasing RRS recovery rates (top left) and at 1 % 
RRS recovery rate (bottom left), found to co-occur in the literature as reported in the STRING database (upper right), and recovered by Y2H (lower right). Shading 
and error bars indicate standard error of the proportion, p values, two-sided Fisher’s exact tests. For n values, see Table S6. 

(B) Adjacency matrix showing Lit-BM-13 interactions, with proteins in bins of ^350 and ordered by number of publications along both axes. Upper and right 
histograms show the median number of publications per bin. The color intensity of each square reflects the total number of interactions between proteins for the 
corresponding bins. Total number of interactions per bin (lower histogram). Number of products from GWAS loci (Hindorff et al., 2009), Mendelian disease 
(Hamosh et al., 2005), and Sanger Cancer Gene Census (Cancer Census) (Futreal et al., 2004) genes per bin (circles). 



(legend continued on next page) 



1214 Cell 159, 1212-1226, November 20, 2014 ©2014 Elsevier Inc. 





Cell 



(Rual et al., 2005) (Figures 1C and ID; see Extended Experi- 
mental Procedures, Section 3). A search space consisting of all 
pairwise combinations of proteins encoded by ~1 3,000 genes 
(“Space II”; Table S2B) was systematically probed, representing 
a 3.1 -fold increase with respect to the HI-l-05 search space. To 
gain in sensitivity, we performed the Y2H assay in different strain 
backgrounds that showed increased detection of pairs of a pos- 
itive reference set (PRS) composed of high-quality pairs from the 
literature without increasing the detection rate of RRS pairs. To 
increase our sampling, the entire search space was screened 
twice independently. Pairs identified in this first pass were sub- 
sequently tested pairwise in quadruplicate starting from fresh 
yeast colonies. To ensure reproducibility, only pairs testing pos- 
itive at least three times out of the four attempts and with 
confirmed identity were considered interacting pairs, resulting 
in ~1 4,000 distinct interacting protein pairs. 

We validated these binary interactions using three binary pro- 
tein interaction assays that rely on different sets of conditions 
than the Y2H assay: (1 ) reconstituting a membrane-bound recep- 
tor complex in mammalian cells using MAPPIT, (2) in vitro using 
the well-based nucleic acid programmable protein array 
(wNAPPA) assay (Braun et al., 2009; Ramachandran et al., 
2008), and (3) reconstituting a fluorescent protein in Chinese 
hamster ovary cells using a protein-fragment complementation 
assay (PCA) (Nyfeler et al., 2005) (see Extended Experimental 
Procedures, Section 4). The Y2H pairs exhibited validation rates 
that were statistically indistinguishable from a PRS of ~500 Lit- 
BM interactions while significantly different from an RRS of 
~700 pairs with all three orthogonal assays and over a large range 
of score thresholds (Figure ID, Tables S2A and S2C), demon- 
strating the quality of the entire data set. Using three-dimensional 
cocrystal structures available for protein complexes in the Protein 
Data Bank (Berman et al., 2000) and for domain-domain interac- 
tions (Stein et al., 201 1) (Figure S2 and Tables S2D, S2E, and S2F; 
see Extended Experimental Procedures, Sections 5 and 6), we 
also demonstrated that our binary interactions reflect direct bio- 
physical contacts, a conclusion in stark contrast to a previous 
report suggesting that Y2FI interactions are inconsistent with 
structural data (Edwards et al., 2002). Our results also suggested 
that Y2FI sensitivity correlates with the number of residue-residue 
contacts and thus presumably with interaction affinity. The 
corresponding human /nteractome data set covering Space II 
and reported in 2074 (FII-ll-14; Table S2G) is the largest experi- 
mentally-determined binary interaction map yet reported, with 
13,944 interactions among 4,303 distinct proteins. 

Overall Biological Significance 

To assess the overall functional relevance of FII-ll-14, we 
combined computational analyses with a large-scale experi- 



mental approach. We first measured enrichment for shared 
Gene Ontology (GO) terms and phenotypic annotations and 
observed that HI-ll-14 shows significant enrichments that are 
similar to those of Lit-BM-13 (Figures 2A and 2B; see Extended 
Experimental Procedures, Section 7). Second, we measured 
how much binary interactions from HI-ll-14 reflect membership 
in larger protein complexes as annotated in CORUM (Ruepp 
et al., 2010) or reported in a cocomplex association map 
(Woodsmith and Stelzl, 2014). In both cases, we observed a 
significant enrichment for binary interactions between protein 
pairs that belong to a common complex (p < 0.001 ; Figure 2B). 
Third, we performed a similar analysis using tissue-specific 
mRNA expression data across the 16 human tissues of the lllu- 
mina Human Body Map 2.0 project as well as cellular compart- 
ment localization annotations from the GO Slim terms. Again, 
HI-ll-14 was enriched for interactions mediated by protein pairs 
present in at least one common compartment or cell type (Fig- 
ures 20 and 2D). Finally, we measured the overlap of HI-ll-14 
with specific biochemical relationships, as represented by 
kinase-substrate interactions. Both HI-ll-14 and Lit-BM-13 
contained significantly more PPIs reflecting known kinase- 
substrate relationships (Hornbeck et al., 2012) than the corre- 
sponding degree-controlled randomized networks (Figure 2E). 
In addition, HI-ll-14 tended to connect tyrosine and serine/ 
threonine kinases (Manning et al., 2002) to proteins with tyro- 
sine or serine/threonine phospho-sites (Hornbeck et al., 2012; 
Olsen et al., 2010), respectively (Figure S3A), pointing to 
the corresponding interactions being genuine kinase-sub- 
strate interactions. In short, our systematic interactome map, 
which was generated independently from any pre-existing 
biological information, reveals functional relationships at 
levels comparable to those seen for the literature-based 
interaction map. 

To further investigate the overall biological relevance of HI-ll- 
14, we used an experimental approach that compares the 
impact of mutations associated with human disorders to that 
of common variants with no reported phenotypic consequences 
on biophysical interactions (Figure 3). Our rationale is that a set of 
interactions corresponding to genuine functional relationships 
should more likely be perturbed by disease-associated mu- 
tations than by common variants. The following example will 
illustrate this concept. Mutations R24C and R24H in CDK4 are 
clearly associated with melanoma by conferring resistance to 
CDKN2A inhibition (Wolfel et al., 1995), whereas N41S and 
S52N mutations are of less clear clinical significance (Zhong 
et al., 2009) and have remained functionally uncharacterized. 
HI-ll-14 contains five CDK4 interactors: two inhibitors (CDKN2C 
and CDKN2D), two cyclins (CCND1 and CCND3), and HOOK1 , a 
novel interacting partner and a potential phosphorylation target 



(C) Improvements from first-generation to second-generation interactome mapping based on an empirically-controlled framework (Venkatesan et al., 2009). 
Completeness: fraction of all pairwise protein combinations tested; Assay sensitivity: fraction of all true biophysical interactions that are identifiable by a 
given assay; Sampling sensitivity: fraction of identifiable interactions that are detected in the experiment; Precision: fraction of reported pairs that are true 
positives. PRS: positive reference set; RRS: random reference set. 

(D) Experimental pipeline for identifying high-quality binary protein-protein interactions (left). ORF: open reading frame. Fraction of HI-ll-14, PRS, and RRS pairs 
(right) recovered by MAPPIT, PCA, and wNAPPA at increasing assay stringency. Shading indicates standard error of the proportion, p > 0.05 for all assays when 
comparing PRS and HI-ll-14 at 1% RRS, two-sided Fisher’s exact tests. For n values, see Table S6. 

See also Figures SI and S2 and Tables SI and S2. 
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Figure 2. Overall Biological Significance 

(A) Schematic of the method to assess biological 
relevance of binary maps. 

(B) Enrichment of binary interactome maps for 
functional relationships (left) and cocomplex 
memberships (right). Error bars indicate 95% 
confidence intervals. BP: Biological process; MF: 
Molecular function; CC: Cellular component. 
Mouse phenotypes: Shared phenotypes in mouse 
models by orthology mapping. MS: Mass-spec- 
trometry-based map. Enrichments: p < 0.05 for all 
annotations and maps, two-sided Fisher’s exact 
tests. For n values, see Table S6. 

(C) and (D) Fraction of binary interactions between 
proteins localized in a common cellular compart- 
ment and proteins copresent in at least one cell 
type (arrows) compared to those in 1 ,000 degree- 
controlled randomized networks. Empirical p 
values. For n values, see Table S6. 

(E) Number of known kinase-substrate interactions 
found in binary maps (arrows) compared to those 
in 1 ,000 randomized networks. Empirical p values 
are shown. 

See also Figure S3. 
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of CDK4 (Figure S3B). In agreement with previous reports, the 
comparative interaction profile shows that R24C and R24H, 
but not N41S and S52N, specifically perturb CDK4 binding to 
CDKN2C (Figure 3). 

In total, we identified 32 human genes for which: (1) the cor- 
responding gene product is reported to have binary interactors 
in HI-ll-14, (2) germline disease-associated missense mutations 
have been reported, and (3) common coding missense variants 
unlikely to be involved in any disease have been identified in the 
1000 Genomes Project (1000 Genomes Project Consortium, 
2012). To avoid overrepresentation of certain genes, we sele- 
cted a total of 115 variants, testing up to four disease and 
four common variants per disease gene for their impact on 
the ability of the corresponding proteins to interact with known 
interaction partners (see Extended Experimental Procedures, 
Section 8). Disease variants were 10-fold more likely to perturb 
interactions than nondisease variants (Figure 3 and Table S3). 
Strikingly, more than 55% of the 1 07 HI-ll-1 4 interactions tested 
were perturbed by at least one disease-associated variant, and 
the same trend was observed when considering only mutants 
with evidence of expression in yeast as indicated by their ability 
to mediate at least one interaction (Figure S3C). Examples of 
novel specifically perturbed interactions include AANAT- 



BHLHE40 and RAD51D-IKZF1 (Figure 3). 
In the first case, the A129T mutation in 
AANAT is known to be associated with 
a delayed sleeping phase syndrome 
and specifically perturbs an interaction 
between AANAT and BHLHE40, the 
product of a gene reported to function 
in circadian rhythm regulation (Naka- 
shima et al., 2008). In the second case, 
the breast-cancer-associated RAD51 D 
E233G mutation perturbs interactions 
with a number of partners, including the known cancer gene 
product IKZF1 (Futreal et al., 2004). 

Altogether these computational and experimental results pro- 
vide strong evidence that HI-ll-14 pairs correspond to biologi- 
cally relevant interactions and represent a valuable resource to 
further our understanding of the human interactome and its per- 
turbations in human disease. 

A “Broader” Interactome 

Unlike literature-curated interactions, HI-ll-14 protein pairs are 
distributed homogeneously across the interactome space (Fig- 
ure 4A), indicating that sociological biases, and not fundamental 
biological properties, underlie the existence of a densely popu- 
lated zone in the literature. Since 1 994, the number of high-qual- 
ity binary literature PPIs has grown roughly linearly to reach 
~1 1 ,000 interactions in 2013 (Figure 4B), while systematic data 
sets are punctuated by a few large-scale releases. Although 
the sparse territory of the literature map gradually gets popu- 
lated, interaction density in this zone continues to lag behind 
that of the dense zone (Figure 4B). In terms of proteome 
coverage, the expansion rate is faster for systematic maps 
than for literature maps, especially in the sparse territory (Fig- 
ure 4C and Figure S4A; see Extended Experimental Procedures, 
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Figure 3. Perturbations of Protein Interactions by Disease and Common Variants 

Predicted effect of mutations on PPIs as a function of their biological relevance (top left). Fraction of interactions of the wild-type gene product lost by mutants 
bearing the disease-associated or common variants (top right, error bars indicate standard error of the proportion), p value, two-sided Fisher’s exact test. 
Comparison of interaction profile of wild-type CDK4, AANAT, and RAD51D to the interaction profile of mutants bearing disease or common variants (bottom). 
Yeast growth phenotypes on SC-Leu-Trp-His+3AT media in quadruplicate experiments are shown. 

See also Figure S3 and Table S3. 



Section 9). While Lit-BM-13 provides more information in the 
dense zone, HI-ll-1 4 reveals interactions for more than 2,000 
proteins absent from Lit-BM-13. These observations are likely 
due to a tendency of the literature map to expand from already 
connected proteins (Figure 4D). 



To more deeply explore the heterogeneous coverage of the 
human interactome, we compared HI-ll-1 4 and Lit-BM-13 to a 
collection of ~25,000 predicted binary PPIs of /ligh-confidence 
(PrePPI-HC) (Zhang et al., 2012) and a co-fractionation map of 
~1 4,000 potentially binary interactions (Co-Frac) (Havugimana 
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Figure 4. A “Broader” Interactome 

(A) Adjacency matrices showing Lit-BM-13 (blue) and HI-ll-14 (purple) interactions, with proteins in bins of ^350 and ordered by number of publications along 
both axes. The color intensity of each square reflects the total number of interactions for the corresponding bins. 

(B) Total number of binary interactions in literature and systematic interactome maps over the past 20 years (top), with years reflecting either date of public release 
of systematic binary data sets or date of publication that resulted in inclusion of interactions in Lit-BM-13. Adjacency matrices (bottom) as in Figure 4A. 

(C) Fraction of the human proteome present in binary interactome maps at selected time points since 1994, considering the full interactome space (left) or only 
dense (middle) and sparse (right) zones of Lit-BM-13 with respect to number of publications. 

(D) Fraction of new interactions connecting two proteins that were both absent from the map at the previous time point (four years interval; middle) compared to 
the average in 1 ,000 randomized networks (right). Error bars indicate standard deviation. 



et al., 2012). We tested the extent to which these two data sets 
contain binary interactions (see Extended Experimental Proce- 
dures, Section 10). Representative samples from both Co-Frac 
and PrePPI-HC were recovered by Y2H at a much lower rate 
than a sample of Lit-BM-13 and appeared statistically indi- 



stinguishable from random pairs (Figure 5A and Table S4A). A 
literature non-binary data set (Lit-NB-13) performed similarly. 
However, Co-Frac and PrePPI-HC, like Lit-NB-13, were both 
significantly enriched for functionally relevant relationships. 
Thus, although these data sets represent potentially valuable 
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resources, both Co-Frac and PrePPI-HC appear to be more 
comparable to nonbinary than to binary data sets. Surprisingly, 
even though PrePPI-HC and Co-Frac systematically surveyed 
the full human proteome and map different portions of the inter- 
actome (Figures S4B), both exhibit a strong tendency to report 
interactions among well-studied proteins (Figure 5B). This bias 
is likely due to the integration of functional annotations in the 
generation of both data sets. 

Because coverage might depend on gene expression levels, 
we also examined interactome maps for expression-related 
sparse versus dense zones. Co-Frac shows a strong bias toward 
interactions involving proteins encoded by genes highly ex- 
pressed in the cell lines used (Figure 5B). This expression-depen- 
dent bias is echoed in the literature map, perhaps reflecting a 
general tendency to study highly expressed proteins. In contrast, 
both HI-ll-1 4 and PrePPI-HC exhibit a uniform interaction density 
across the full spectrum of expression levels, likely explained by 
the standardized expression of proteins tested in Y2H and by the 
independence of homology-based predictions from expression 
levels. 

We more broadly explored the intrinsic biases that might influ- 
ence the appearance of sparsely populated zones by examining 
21 protein or gene properties, roughly classified as expression-, 
sequence-, or knowledge-based (Figures 5B and 5C, Tables S4B 
and S4C; see Extended Experimental Procedures, Section 9). 
For example, PrePPI-HC is virtually devoid of interactions be- 
tween proteins lacking Pfam domains, consistent with conserved 
domains forming the basis of the prediction method. HI-ll-1 4 
appears depleted of interactions among proteins containing pre- 
dicted transmembrane helices, consistent with expected limita- 
tions of the Y2H assay (Stagljar and Fields, 2002). Co-Frac is 
similarly depleted in interactions involving proteins with trans- 
membrane helices, which may result from membrane-bound 
proteins being filtered out during biochemical fractionations. 
Compared to HI-ll-1 4, HI-l-05 presented a less homogenous 
coverage of the space with respect to abundance and knowl- 
edge properties, likely reflecting the content of early versions 
of the hORFeome (Figure S4C). Importantly, no single map 
appeared unbiased in all 21 examined properties. A combined 
map presented a slightly increased homogeneity although 
intrinsic knowledge biases of the three maps using literature- 
derived evidence were still predominant. 

To confirm that HI-ll-1 4 interactions found in the sparse zones 
of the three other maps are of as high quality as those found in 
dense zones, we compared MAPPIT validation rates and func- 
tional enrichment across these zones for all protein properties 
examined. MAPPIT validation rates of dense and sparse zone 
pairs were consistent for nearly all properties (Figures 5D and 
S4D), indicating that HI-ll-1 4 interactions are of similar biophysical 
quality throughout the full interactome space. Functional enrich- 
ment within the sparse zone was statistically indistinguishable 
from that of the dense zone (Figures 5D and S4E), demonstrating 
the functional importance of HI-ll-1 4 biophysical interactions in 
zones covered sparsely by other types of interactome maps. 

Considering all current maps, more than half of the proteome 
is now known to participate in the interactome network. Our sys- 
tematic exploration of previously uncharted territories dramati- 
cally expands the interactome landscape, suggesting that the 



human interactome network is broader in scope than previously 
observed and that the entire proteome may be represented 
within a fully mapped interactome. 

Interactome Network and Cancer Landscape 

Genes associated with the same disease are believed to be 
preferentially interconnected in interactome networks (Barabasi 
et al., 2011; Vidal et al., 2011). However, in many cases, these 
observations were made with interactome maps that are 
composites of diverse evidence, e.g., binary PPIs, cocomplex 
memberships, and functional associations, a situation further 
complicated by the uneven quality and sociological biases 
described above. Using HI-ll-1 4, we revisited this concept for 
cancer gene products. Our goal was to investigate whether the 
cancer genomic landscape is limited to the known cancer genes 
curated in the Sanger Cancer Gene Census (“Cancer Census”) 
(Futreal et al., 2004), or if, alternatively, it might extend to some 
of the hundreds of additional candidate genes enriched in so- 
matic mutations uncovered by systematic cancer genome 
sequencing (“SM genes”) (Chin et al., 2011) and/or identified 
by functional genomic strategies such as Sleeping Beauty trans- 
poson-based screens in mice (“SB genes”) (Copeland and Jen- 
kins, 2010) or global investigations on DMA tumor virus targets 
(“VT genes”) (Rozenblatt-Rosen et al., 2012). 

Given our homogeneous coverage of the space for known 
(Cancer Census) and candidate (SB, SM, and VT) cancer genes 
(Figure 6A), we first tested the postulated central role of cancer 
gene products in biological networks (Barabasi et al., 2011) and 
verified that both sets tend to have more interactions and to be 
more central in the systematic map than proteins not associ- 
ated with cancer (Figure 6B). We then examined the intercon- 
nectivity of known cancer proteins and showed that Cancer 
Census gene products interact with each other more frequently 
than expected by chance, a trend not apparent in HI-l-05 (Fig- 
ure 6C). We sought to use this topological property as the basis 
for novel cancer gene discovery in the large lists of cancer can- 
didates from genomic and functional genomic screens. 

We examined whether products of candidate cancer genes 
identified by GWAS (Table S5A) tend to be connected to Cancer 
Census proteins, and observed significant connectivity in all four 
maps (Figure S5A; see Extended Experimental Procedures, Sec- 
tion 11). When loci containing a known cancer gene were 
excluded, only HI-ll-1 4 showed such connectivity, supporting 
its unique value to identify cancer candidate genes beyond those 
already well demonstrated (Figures 7A and S5A). In further 
support of their association with cancer, genes in cancer 
GWAS loci prioritized by “guilt-by-association” in HI-ll-1 4 tend 
to correspond to cancer candidates from systematic cancer 
studies (Figures 7B and 7C). These results suggest that can- 
cer-associated proteins tend to form subnetworks perturbed in 
tumorigenesis, and that HI-ll-1 4 provides new context to priori- 
tize cancer genes from genome-wide studies. 

The following example illustrates the power of our combined 
approach. C-terminal Binding Protein 2 (CTBP2) is encoded 
at a locus associated with prostate cancer susceptibility 
(Thomas et al., 2008) and belongs to both SB and VT gene lists 
(Mann et al., 2012; Rozenblatt-Rosen et al., 2012). Two Cancer 
Census genes, IKZF1 and FLU , encode interacting partners of 
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Figure 5. Comparison of Interaction Mapping Approaches 

(A) Evaluation of the quality of Co-Frac (orange), PrePPI-HC (red), and pairs from small-scale experiments in the literature with no binary evidence (Lit-NB-13, 
grey). Fraction of pairs recovered by Y2H as compared to pairs from Lit-BM-13 and pairs of randomly selected proteins (RRS) (left). Error bars indicate standard 
error of the proportion. Enrichment in functional interactions and cocomplex memberships (right). Legend as in Figure 2B. For n values, see Table S6. 

(B) Adjacency matrices for HI-ll-14, Lit-BM-13, Co-Frac, and PrePPI-HC maps, with proteins per bins of ^350 and ordered by number of publications, mRNA 
abundance in HEK cells, fraction of protein sequence covered by Pfam domains, or fraction of protein sequence in transmembrane helices. Figure legend as in 
Figure IB. 

(C) Highest interaction density imbalances (observed minus expected) in the four maps, the union of all four maps, and our previous binary map (HI-l-05) for 21 
protein properties. 



(legend continued on next page) 
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CTBP2 in HI-ll-14. These are transcription factors with tumor 
suppressor (Payne and Dovat, 2011) and proto-oncogene (Korn- 
blau et al., 2011) roles, respectively, in lymphoid tumors. Given 
its interactions with IKZF1 and FLU, we investigated the poten- 
tial role of CTBP2 in lymphoid tumorigenesis. In the Cancer Cell 
Line Encyclopedia (Barretina et al., 2012), FLU was significantly 
more often amplified in lymphoid than in other cell lines (Fig- 
ure 7D), consistent with its proposed proto-oncogenic role in 
these tumors. In contrast, both CTBP2 and IKZF1, but not 
CTBP1 , were deleted significantly more often in lymphoid cancer 
cell lines. Notably, deletion of CTBP2 or IKZF1 and amplification 
of FLU were mostly nonoverlapping in the different cell lines, 
suggesting that either event may be sufficient to affect tumori- 
genesis (Figure S5B). Altogether, these results suggest a role 
for CTBP2 in suppressing lymphoid tumors by direct repression 
of FLU function, potentially involving IKZF1. 

Finally, we assessed how HI-ll-14 interactions can be inte- 
grated with genomic and functional genomic data sets. Going 
beyond the “guilt-by-profiling” concept, we also used these 
gene sets in “guilt-by-association” predictions in a combined 
model (Figure S6A), which leads to substantially improved can- 
cer gene rankings over those found using either predictive strat- 
egy alone (Figures 7E, S6B, and S6C and Table S5B; see 
Extended Experimental Procedures, Section 12). In contrast, a 
similar analysis using HI-l-05 interactions showed that its limited 
size prevented inclusion of any guilt-by-association terms 
(Figure S6D). Genes significantly mutated in cancer patients 
from recent TCGA pan-cancer mutation screens (Table S5C) 
(Lawrence et al., 2014) were enriched among highly ranked pre- 
dictions from the combined model (p = 6 x 1 0“^, one-sided Wil- 
coxon rank test), supporting the validity of our integrated cancer 
gene predictions. Our top-ranked prediction was the cyclin- 
dependent kinase 4 (CDK4), a well-known cancer gene product. 
Four other genes from the Cancer Census list appeared among 
the top 25 ranked genes. Strikingly, STATS, which ranked third, 
was added to the Cancer Census after our training set was es- 
tablished, highlighting the ability of this approach to identify 
novel cancer gene products. 

To characterize the biological processes in which the candi- 
date cancer genes predicted by the combined model are likely 
to be involved, we identified binary interactions linking them to 
each other or to Cancer Census proteins in the 12 “pathways 
of cancer” relevant to cancer development and progression 
(Table S5D) (Vogelstein et al., 2013). Cf our top 100 candidates, 
60 mapped to at least one cancer pathway (Figures 7F and S7), 
twice as many as would be expected from predictions using 
either the guilt-by-profiling or guilt-by-association approach 
alone. We propose that many novel cancer candidates can be an- 
notated to specific processes based on their interactions with 
Cancer Census gene products and known participation in cellular 
pathways. For example, the candidate protein IDS, a DNA-bind- 
ing inhibitor, interacts with the two Cancer Census transcription 



factors TCF1 2 and TCF3, suggesting a role for IDS in the regula- 
tion of transcription by inhibiting binding of specific transcription 
factors to DNA(Loveys et al., 1996; Richter et al., 2012). CTBP2, 
which we identified as a potential suppressor in lymphoid tumors, 
represents another example (Figures 5E and S7). 

In summary, the increased and uniform coverage of HI-ll-14 
demonstrates that known and candidate cancer gene products 
are highly connected in the interactome network, which in turn 
provides unbiased evidence for an expanded functional cancer 
landscape. 

DISCUSSION 

By systematically screening half of the interactome space with 
minimal inspection bias, we more than doubled the number of 
high-quality binary PPIs available from the literature. Covering 
zones of the human interactome landscape that have been 
weakly charted by other approaches, our systematic binary 
map provides deeper functional context to thousands of pro- 
teins, as demonstrated for candidates identified in unbiased 
cancer genomic screens. Systematic binary mapping therefore 
stands as a powerful approach to “connect the dots” of the 
genomic revolution. 

Combining high-quality binary pairs from the literature with 
systematic binary maps, 30,000 high-confidence interactions 
are now available. It is likely that a large proportion of the human 
interactome can soon be mapped by taking advantage of the 
emergence of reference proteome maps (Kim et al., 2014; 
Wilhelm et al., 201 4), a combination of nearly complete clone col- 
lections (Yang et al., 2011), rapid improvements in Y2H assay 
sensitivity, and emerging interaction-mapping technologies 
that drastically reduce cost (Caufield et al., 2012; Stagljar and 
Fields, 2002; Yu et al.,2011). 

Reference binary interactome maps of increased coverage 
and quality will be required to interpret condition-specific inter- 
actions and to characterize the effects of splicing and genetic 
variation on interactions (Zhong et al., 2009). While protein-pro- 
tein interactions represent an important class of interactions be- 
tween macromolecules, future efforts integrating this information 
with protein-DNA, protein-RNA, RNA-RNAor protein-metabolite 
interactions will provide a unified view of the molecular inter- 
actions governing cell behavior. Just as a reference genome 
enabled detailed maps of human genetic variation (1000 Ge- 
nomes Project Consortium, 2012), completion of a reference 
interactome network map will enable deeper insight into geno- 
type-phenotype relationships in human. 

EXPERIMENTAL PROCEDURES 
Extraction of the Literature- Based Data Sets 

Human PPIs annotated with tractable publication records were extracted from 
seven databases through August 2013. Large-scale systematic data sets and 



(D) Precision at 1 % RRS recovery in the MAPPIT assay (top, error bars indicate standard error of the proportion) and functional enrichment (bottom, union of Gene 
Ontology and mouse phenotypes based annotations, error bars indicate 95% confidence intervals) of HI-ll-14 pairs found in dense and sparse zones mirrored 
from Lit-BM-13, Co-Frac, and PrePPI-HC. p > 0.05 for all pairwise comparisons of dense and sparse zones, two-sided Fisher’s exact tests. For n values, see 
Table S6. 

See also Figure S4 and Table S4. 
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Figure 6. Network Properties of Cancer 
Gene Products 

(A) Adjacency matrices for Lit-BM-13 and HI-ll-14 
only showing interactions involving the product of 
a Cancer Census (Futreal et al., 2004) or of a 
candidate cancer gene. Figure legend as in Fig- 
ure 1B. Lower histograms show for each bin, the 
fraction of cancer candidates having at least one 
interaction. 

(B) Distribution of the number of interactions (de- 
gree) and normalized number of shortest paths 
between proteins (betweenness centrality) for 
products of Cancer Census and of candidate 
cancer genes in Lit-BM-1 3 and in HI-ll-14 maps as 
compared to other proteins (right; * for p < 0.05, NS 
for p > 0.05, two-sided Wilcoxon rank sum tests). 
For n values, see Table S6. 

(C) Number of interactions between products of 
Cancer Census genes (arrows) in HI-l-05, HI-ll-14, 
Lit-BM as of 2000 (Lit-BM-00) and as of 2013 (Lit- 
BM-13), as compared to 1,000 degree-controlled 
randomized networks. Empirical p values. For n 
values, see Table S6. 



Generation of the Binary Protein-Protein 
Interaction Map 

HI-ll-14 was generated by screening all pairwise 
combinations of 15,517 ORFs from hORFeome 
v5.1 (Space II) as described previously (Dreze 
et al., 2010). ORFs encoding first pass pairs were 
identified either by Sanger sequencing or by 
Stitch-seq (Yu et al., 2011). HI-ll-14 was validated 
by comparing a subset of 809 interactions to a 
positive and a random reference set of 460 and 
698 protein pairs, respectively, using MAPPIT, 
PCA, and wNAPPA assays. 

Interaction Perturbation by Missense 
Mutations 

Disease variants were obtained from the Human 
Gene Mutation Database (HGMD2009 V2) (Stenson 
et al. , 201 4) and common variants were derived from 
the 1 000 Genomes Project (1 000 Genomes Project 
Consortium, 2012). Only variants with a minor allele 
frequency above 1 % were considered common. All 
successfully cloned disease and common variants 
were systematically tested for interaction with all in- 
teractors of their wild-type counterpart. 
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pairs involving products of UBC, SUM01 , SUM02, SUM03, SUM04, or 
NEDD8, were excluded. The remaining pairs were divided into those having 
no pieces of binary evidence (Lit-NB) and those with at least one piece of bi- 
nary evidence based on PSI-MI experimental method codes. Binary pairs 
were divided between pairs with one and with two or more pieces of evidence 
(Lit-BS and Lit-BM, respectively). For benchmark experiments in Y2H, 
MAPPIT, PCA, and wNAPPA, equivalent data sets were extracted similarly 
in December 2010. 



Interaction Density Imbaiance 

For each protein property, we ranked all proteins 
and, for any property threshold, partitioned the in- 
teractome space into a first region containing pairs 
of proteins both above (or below) the threshold, 
and a second region containing all remaining pairs. 
Interaction density imbalance of a given map for a 
given threshold was calculated as the fraction of PPIs observed in the first re- 
gion minus the fraction of PPIs expected assuming a uniform distribution in the 
space. Dense and sparse zones were defined by identifying the threshold for 
which the deviation from expectation is maximal. 

Measure of Functional Enrichment 

For each pairwise comparison, PPI and functional maps were trimmed to pairs 
where both proteins were present in both maps and restricted to Space II to 
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allow comparison between PPI maps. Functional enrichment odds ratios were 
calculated using Fisher’s exact tests. 

GWAS Analysis 

307 distinct cancer-associated SNPs were identified from 75 GWAS publica- 
tions covering 1 0 types of cancer and 1 42 distinct loci were identified at a link- 
age disequilibrium threshold of 0.9. For each map, we calculated the number 
of loci encoding an interactor of a Cancer Census protein over the number of 
loci encoding a protein in the PPI map. To assess significance, we measured 
the corresponding fraction when randomly selecting for each locus the same 
number of proteins than genes with products in the PPI map. 

Cancer Association Scoring System 

For each gene, seven features were measured. Three features represent 
membership in the SB, SM, and VT lists of candidate cancer genes (“guilt- 
by-profiling” features). The four other features represent its number of interac- 
tors in HI-ll-14 that are present in these three lists and in the Cancer Census 
list, normalized by the expected numbers in degree-controlled randomized 
networks (“guilt-by-association” features). We measured the ability of each 
feature to prioritize known Cancer Census genes with separate logistic regres- 
sion models. We combined all seven features in a forward stepwise logistic 
regression model using the Akaike information criterion to determine the 
stepwise halting. The final set of features selected was: the SB, SM, and VT 
guilt-by-profiling and the Cancer Census and SB guilt-by-association features. 
“Receiver Operating Characteristic” curves were obtained by measuring at 
decreasing score threshold the fraction of known Cancer Census genes recov- 
ered and the corresponding fraction of proteins predicted as candidate cancer 
genes. 

Data Sets 

For reference data sets used in this study, see Extended Experimental Proce- 
dures, Section 13. All high-quality binary PPIs described in this paper can be 
accessed at: http://interactome.dfci.harvard.edu/H_sapiens/. 

SUPPLEMENTAL INFORMATION 

Supplemental Information includes Extended Experimental Procedures, seven 
figures, and six tables and can be found with this article online at http://dx.doi. 
org/1 0. 1 01 6/j.cell.201 4.1 0.050. 
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Figure 7. Interactome Network and Cancer Landscape 

(A) Fraction of cancer-related GWAS loci containing at least one gene encoding a protein that interacts with the product of a Cancer Census gene in H 1-1-05, HI-ll- 
14, Lit-BM-13, Co-Frac, and PrePPI-HC (arrows) as compared to randomly selected loci genes. GWAS loci already containing a Cancer Census gene are 
excluded. Empirical p values. For n values, see Table S6. 

(B) Network representing products of genes in cancer-associated GWAS loci and their interactions with Cancer Census proteins in HI-ll-14 (right), and a 
representative example of the network obtained for randomized loci genes (left). 

(C) Fraction of GWAS loci gene products interacting with a Cancer Census protein also identified in systematic genomic and functional genomic studies (arrow) as 
compared to the fraction obtained for randomized loci genes (bottom right). Empirical p value. 

(D) CTBP2 and IKZF1 are deleted in significantly more hematopoietic and lymphoid cancer cell lines than in other cancer cell lines. CCLE, Cancer Cell Line 
Encyclopedia. Each barplot compares the fraction of cell lines from the 1 63 hematopoietic and lymphoid (hatched bars) or 71 7 other (empty bars) cell types where 
CTBP1 ,CTBP2, FLI , or I KZF1 were found amplified (red) or deleted (blue). Error bars indicate standard error of the proportion, p values, two-sided Fisher’s exact 
tests (NS for p > 0.05). 

(E) Predictive power of guilt-by-profiling and guilt-by-association models compared to the combined model (Figure S6; see Extended Experimental Procedures, 
Section 11). AUC: Area under the curve in Figure S6C. Error bars indicate standard error of the proportion, p value, two-sided Wilcoxon rank sum test. SB, 
Sleeping Beauty transposon-based mouse cancer screen; SM, Somatic mutation screen in cancer tissues; VT, Virus targets. 

(F) Binary interactions from HI-ll-14 involving the top candidates and Cancer Census gene products in the twelve pathways associated to cancer development 
and progression. 

See also Figures S5, S6, and S7 and Table S5. 



1224 Cell 159, 1212-1226, November 20, 2014 ©2014 Elsevier Inc. 




Cell 



Dana-Farber Cancer Institute Strategic Initiative to M.V. I.L. is a postdoctoral 
fellow with the FWO-V. M.V. is a “Chercheur Qualifie Honoraire” from the 
Fonds de la Recherche Scientifique (FRS-FNRS, Wallonia-Brussels Federa- 
tion, Belgium). Since performing the work described, C. Fontanillo has become 
an employee of Celgene Research SL, part of the Celgene Corporation. 

Received: September 17, 2014 
Revised: October 21, 2014 
Accepted: October 30, 2014 
Published: November 20, 2014 

REFERENCES 

1000 Genomes Project Consortium, Abecasis, G.R., Auton, A., Brooks, L.D., 
DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., and 
McVean, G.A. (201 2). An integrated map of genetic variation from 1 ,092 human 
genomes. Nature 491 , 56-65. 

Bader, G.D., Betel, D., and Hogue, C.W. (2003). BIND: the Biomolecular Inter- 
action Network Database. Nucleic Acids Res. 37, 248-250. 

Barabasi, A.-L., Gulbahce, N., and Loscaizo, J. (2011). Network medicine: a 
network-based approach to human disease. Nat. Rev. Genet. 72, 56-68. 

Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A.A., Kim, 
S., Wilson, C.J., Lehar, J., Kryukov, G.V., Sonkin, D., et al. (2012). The Cancer 
Cell Line Encyclopedia enables predictive modelling of anticancer drug sensi- 
tivity. Nature 483 , 603-607. 

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., 
Shindyalov, I.N., and Bourne, P.E. (2000). The Protein Data Bank. Nucleic 
Acids Res. 28 , 235-242. 

Braun, P., Ta§an, M., Dreze, M., Barrios-Rodiles, M., Lemmens, I., Yu, H., Sa- 
halie, J.M., Murray, R.R., Roncari, L, de Smet, A.S., et al. (2009). An experi- 
mentally derived confidence score for binary protein-protein interactions. 
Nat. Methods 6 , 91-97. 

Caufield, J.H., Sakhawalkar, N., and Uetz, P. (2012). A comparison and optimi- 
zation of yeast two-hybrid systems. Methods 58 , 317-324. 

Chatr-Aryamontri, A., Breitkreutz, B.J., Heinicke, S., Boucher, L., Winter, A., 
Stark, C., Nixon, J., Ramage, L., Kolas, N., O’Donnell, L., et al. (2013). The Bio- 
GRID interaction database: 2013 update. Nucleic Acids Res. 47 (Database 
issue), D816-D823. 

Chin, L., Hahn, W.C., Getz, G., and Meyerson, M. (2011). Making sense of can- 
cer genomic data. Genes Dev. 25, 534-555. 

Copeland, N.G., and Jenkins, N.A. (2010). Harnessing transposons for cancer 
gene discovery. Nat. Rev. Cancer 10 , 696-706. 

Cusick, M.E., Yu, H., Smolyar, A., Venkatesan, K., Carvunis, A.-R., Simonis, N., 
Rual, J.F., Borick, H., Braun, P., Dreze, M., etal. (2009). Literature-curated pro- 
tein interaction datasets. Nat. Methods 6 , 39-46. 

Dreze, M., Monachello, D., Lurin, C., Cusick, M.E., Hill, D.E., Vidal, M., and 
Braun, P. (2010). High-quality binary interactome mapping. Methods Enzymol. 
470 , 281-315. 

Edwards, A.M., Kus, B., Jansen, R., Greenbaum, D., Greenblatt, J., and Ger- 
stein, M. (2002). Bridging structural biology and genomics: assessing protein 
interaction data with known complexes. Trends Genet. 18 , 529-536. 

Eyckerman, S., Verhee, A., Van der Heyden, J., Lemmens, I., Ostade, X.V., 
Vandekerckhove, J., and Tavernier, J. (2001). Design and application of a cyto- 
kine-receptor-based interaction trap. Nat. Cell Biol. 3 , 1114-1119. 

Futreal, P.A., Coin, L., Marshall, M., Down, T., Hubbard, T., Wooster, R., Rah- 
man, N., and Stratton, M.R. (2004). A census of human cancer genes. Nat. Rev. 
Cancer 4, 177-183. 

Hamosh, A., Scott, A.F., Amberger, J.S., Bocchini, C.A., and McKusick, V.A. 
(2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of hu- 
man genes and genetic disorders. Nucleic Acids Res. 33 (Database issue), 
D514-D517. 



Havugimana, P.C., Hart, G.T., Nepusz, T., Yang, H., Turinsky, A.L., Li, Z., 
Wang, P.I., Boutz, D.R., Fong, V., Phanse, S., et al. (2012). A census of human 
soluble protein complexes. Cell 150 , 1068-1081. 

Hindorff, LA., Sethupathy, P., Junkins, H.A., Ramos, E.M., Mehta, J.P., 
Collins, F.S., and Manolio, T.A. (2009). Potential etiologic and functional impli- 
cations of genome-wide association loci for human diseases and traits. Proc. 
Natl. Acad. Sci. USA 106 , 9362-9367. 

Hornbeck, P.V., Kornhauser, J.M., Tkachev, S., Zhang, B., Skrzypek, E., Mur- 
ray, B., Latham, V., and Sullivan, M. (2012). PhosphoSitePlus: a comprehen- 
sive resource for investigating the structure and function of experimentally 
determined post-translational modifications in man and mouse. Nucleic Acids 
Res. 40 (Database issue), D261-D270. 

International Human Genome Sequencing Consortium (2004). Finishing the 
euchromatic sequence of the human genome. Nature 437, 931-945. 

Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., 
Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., et al. (2012). The 
IntAct molecular interaction database in 2012. Nucleic Acids Res. 40 (Data- 
base issue), D841-D846. 

Keshava Prasad, T.S., Goel, R., Kandasamy, K., Keerthikumar, S., Kumar, S., 
Mathivanan, S., Telikicherla, D., Raju, R., Shafreen, B., Venugopal, A., et al. 
(2009). Human Protein Reference Database— 2009 update. Nucleic Acids 
Res. 37 (Database issue), D767-D772. 

Kim, M.S., Pinto, S.M., Getnet, D., Nirujogi, R.S., Manda, S.S., Chaerkady, R., 
Madugundu, A.K., Kelkar, D.S., Isserlin, R., Jain, S., etal. (2014). A draft map of 
the human proteome. Nature 509 , 575-581 . 

Kornblau, S.M., Qiu, Y.H., Zhang, N., Singh, N., Faderl, S., Ferrajoli, A., York, 
H., Qutub, A.A., Coombes, K.R., and Watson, D.K. (2011). Abnormal expres- 
sion of FLU protein is an adverse prognostic factor in acute myeloid leukemia. 
Blood 118 , 5604-5612. 

Lawrence, M.S., Stojanov, P., Mermel, C.H., Robinson, J.T., Garraway, L.A., 
Golub, T.R., Meyerson, M., Gabriel, S.B., Lander, E.S., and Getz, G. (2014). 
Discovery and saturation analysis of cancer genes across 21 tumour types. 
Nature 505 , 495-501 . 

Licata, L., Briganti, L., Peluso, D., Perfetto, L., lannuccelli, M., Galeota, E., 
Sacco, F., Palma, A., Nardozza, A.P., Santonico, E., et al. (2012). MINT, the 
molecular interaction database: 2012 update. Nucleic Acids Res. 40 (Data- 
base issue), D857-D861 . 

Loveys, D.A., Streiff, M.B., and Kato, G.J. (1996). E2A basic-helix-loop-helix 
transcription factors are negatively regulated by serum growth factors and 
by the Id3 protein. Nucleic Acids Res. 24, 2813-2820. 

Mann, K.M., Ward, J.M., Yew, C.C., Kovochich, A., Dawson, D.W., Black, 
M.A., Brett, B.T., Sheetz, T.E., Dupuy, A.J., Chang, D.K., et al.; Australian 
Pancreatic Cancer Genome Initiative (2012). Sleeping Beauty mutagenesis re- 
veals cooperating mutations and pathways in pancreatic adenocarcinoma. 
Proc. Natl. Acad. Sci. USA 109 , 5934-5941. 

Manning, G., Whyte, D.B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002). 
The protein kinase complement of the human genome. Science 298 , 1912- 
1934. 

Nakashima, A., Kawamoto, T., Honda, K.K., Ueshima, T., Noshiro, M., Iwata, 
T., Fujimoto, K., Kubo, H., Honma, S., Yorioka, N., et al. (2008). DEC1 modu- 
lates the circadian phase of clock gene expression. Mol. Cell. Biol. 28 , 4080- 
4092. 

Nyfeler, B., Michnick, S.W., and Hauri, H.P. (2005). Capturing protein interac- 
tions in the secretory pathway of living cells. Proc. Natl. Acad. Sci. USA 102 , 
6350-6355. 

Olsen, J.V., Vermeulen, M., Santamaria, A., Kumar, C., Miller, M.L., Jensen, 
L.J., Gnad, F., Cox, J., Jensen, T.S., Nigg, E.A., et al. (2010). Quantitative phos- 
phoproteomics reveals widespread full phosphorylation site occupancy during 
mitosis. Sci. Signal. 3, ra3. 

Payne, K.J., and Dovat, S. (2011). Ikaros and tumor suppression in acute 
lymphoblastic leukemia. Crit. Rev. Oncog. 16 , 3-12. 



Cell 159, 1212-1 226, November 20, 201 4 ©201 4 Elsevier Inc. 1 225 




Cell 



Ramachandran, N., Raphael, J.V., Hainsworth, E., Demirkan, G., Puentes, 
M.G., Rolfs, A., Hu, Y., and LaBaer, J. (2008). Next-generation high-density 
self-assembling functional protein arrays. Nat. Methods 5, 535-538. 

Richter, J., Schlesner, M., Hoffmann, S., Kreuz, M., Leich, E., Burkhardt, B., 
Rosolowski, M., Ammerpohl, O., Wagener, R., Bernhart, S.H., et al.; ICGC 
MMML-Seq Project (2012). Recurrent mutation of the ID3 gene in Burkitt lym- 
phoma identified by integrated genome, exome and transcriptome 
sequencing. Nat. Genet. 44, 1316-1320. 

Rozenblatt-Rosen, O., Deo, R.C., Padi, M., Adelmant, G., Calderwood, M.A., 
Rolland, T., Grace, M., Dricot, A., Askenazi, M., Tavares, M., et al. (2012). In- 
terpreting cancer genomes using systematic host network perturbations by 
tumour virus proteins. Nature 487, 491-495. 

Rual, J.-F., Venkatesan, K., Hao, T., Hirozane-Kishikawa, T., Dricot, A., Li, N., 
Berriz, G.F., Gibbons, F.D., Dreze, M., Ayivi-Guedehoussou, N., et al. (2005). 
Towards a proteome-scale map of the human protein-protein interaction 
network. Nature 437, 1173-1178. 

Ruepp, A., Waegele, B., Lechner, M., Brauner, B., Dunger-Kaltenbach, I., 
Fobo, G., Frishman, G., Montrone, C., and Mewes, H.W. (2010). CORUM: 
the comprehensive resource of mammalian protein complexes— 2009. Nu- 
cleic Acids Res. 38 (Database issue), D497-D501. 

Salwinski, L., Miller, C.S., Smith, A.J., Pettit, F.K., Bowie, J.U., and Eisenberg, 
D. (2004). The Database of Interacting Proteins: 2004 update. Nucleic Acids 
Res. 32 (Database issue), D449-D451 . 

Stagljar, I., and Fields, S. (2002). Analysis of membrane protein interactions us- 
ing yeast-based technologies. Trends Biochem. Sci. 27, 559-563. 

Stein, A., Ceol, A., and Aloy, P. (2011). 3did: identification and classification of 
domain-based interactions of known three-dimensional structure. Nucleic 
Acids Res. 39 (Database issue), D718-D723. 

Stelzl, U., Worm, U., Lalowski, M., Haenig, C., Brembeck, F.H., Goehler, H., 
Stroedicke, M., Zenkner, M., Schoenherr, A., Koeppen, S., et al. (2005). A hu- 
man protein-protein interaction network: a resource for annotating the prote- 
ome. Cell 122, 957-968. 

Stenson, P.D., Mort, M., Ball, E.V., Shaw, K., Phillips, A., and Cooper, D.N. 
(2014). The Human Gene Mutation Database: building a comprehensive muta- 
tion repository for clinical and molecular genetics, diagnostic testing and 
personalized genomic medicine. Hum. Genet. 133, 1-9. 

Stumpf, M.P., Thorne, T., de Silva, E., Stewart, R., An, H.J., Lappe, M., and 
Wiuf, C. (2008). Estimating the size of the human interactome. Proc. Natl. 
Acad. Sci. USA 105, 6959-6964. 



Thomas, G., Jacobs, K.B., Yeager, M., Kraft, P., Wacholder, S., Orr, N., Yu, K., 
Chatterjee, N., Welch, R., Hutchinson, A., et al. (2008). Multiple loci identified in 
a genome-wide association study of prostate cancer. Nat. Genet. 40, 
310-315. 

Venkatesan, K., Rual, J.-F., Vazquez, A., Stelzl, U., Lemmens, I., Hirozane- 
Kishikawa, T., Hao, T., Zenkner, M., Xin, X., Goh, K.I., etal. (2009). An empirical 
framework for binary interactome mapping. Nat. Methods 6, 83-90. 

Vidal, M., Cusick, M.E., and Barabasi, A.-L. (2011). Interactome networks and 
human disease. Cell 144, 986-998. 

Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., Jr., 
and Kinzier, K.W. (2013). Cancer genome landscapes. Science 339, 1546- 
1558. 

von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel, B. 
(2003). STRING: a database of predicted functional associations between pro- 
teins. Nucleic Acids Res. 31, 258-261. 

Wilhelm, M., Schlegl, J., Hahne, H., Moghaddas Gholami, A., Lieberenz, M., 
Savitski, M.M., Ziegler, E., Butzmann, L., Gessulat, S., Marx, H., et al. 
(2014). Mass-spectrometry-based draft of the human proteome. Nature 509, 
582-587. 

Wolfel, T., Hauer, M., Schneider, J., Serrano, M., Wolfel, C., Klehmann-Hieb, 
E., De Plaen, E., Hankein, T., Meyer zum Buschenfelde, K.H., and Beach, D. 
(1995). A p16INK4a-insensitive CDK4 mutant targeted by cytolytic T lympho- 
cytes in a human melanoma. Science 269, 1281-1284. 

Woodsmith, J., and Stelzl, U. (2014). Studying post-translational modifications 
with protein interaction networks. Curr. Opin. Struct. Biol. 24, 34-44. 

Yang, X., Boehm, J.S., Yang, X., Salehi-Ashtiani, K., Hao, T., Shen, Y., Lu- 
bonja, R., Thomas, S.R., Alkan, O., Bhimdi, T., et al. (2011). A public 
genome-scale lentiviral expression library of human ORFs. Nat. Methods 8, 
659-661 . 

Yu, H., Tardivo, L., Tam, S., Weiner, E., Gebreab, F., Fan, C., Svrzikapa, N., Hir- 
ozane-Kishikawa, T., Rietman, E., Yang, X., et al. (2011). Next-generation 
sequencing to generate interactome datasets. Nat. Methods 8, 478-480. 

Zhang, Q.C., Petrey, D., Deng, L, Qiang, L, Shi, Y., Thu, C.A., Bisikirska, B., 
Lefebvre, C., Accili, D., Hunter, T., et al. (2012). Structure-based prediction 
of protein-protein interactions on a genome-wide scale. Nature 490, 556-560. 
Zhong, Q., Simonis, N., Li, Q.R., Charloteaux, B., Heuze, F., Klitgord, N., Tam, 
S., Yu, H., Venkatesan, K., Mou, D., et al. (2009). Edgetic perturbation models 
of human inherited disorders. Mol. Syst. Biol. 5, 321. 



1226 Cell 159, 1212-1226, November 20, 2014 ©2014 Elsevier Inc. 




Errata 



Cell 



Molecular Architecture of the 
40S* elF1 • elF3 Translation 
Initiation Compiex 

Jan P. Erzberger,* Florian Stengel, Riccardo Pellarin, Suyang Zhang, Tanja Schaefer, Christopher H.S. Aylett, 

Peter Cimermancic, Daniel Boehringer, Andrej Sali, Ruedi Aebersold, and Nenad Ban* 

*Correspondence: jan.erzberger@mol.biol.ethz.ch (J.P.E.), ban@mol.biol.ethz.ch (N.B.) 
http://dx.doi.Org/10.1016/j.cell.2014.11.001 

(Cell 758, 1 1 23-1 1 35; August 28, 201 4) 

In preparing the article above, we inadvertently assigned the coordinates for elF3f to its homolog elF3h and vice versa within the 
mammalian elF3 model during PDB database deposition. This error led to improper positioning of these subunits in Figures 3E, 
7F, and S2B. We present below revised panels for Figures 3 and 7 with the correct assignment for elF3f and elF3h. These figures, 
along with Figure S2, have been corrected online. The deposited PDB files have also been corrected and updated. In addition, 
we have added the missing legend for Figure S2B online. We apologize for any confusion that these errors may have caused. 
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Figure 3. Docking of elF3a/elF3c in the PCIeMPN Core Density of Mammalian 43S and 43SelRES EM Maps 
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ATRX Directs Binding of PRC2 to Xist 
RNA and Poiycomb Targets 

Kavitha Sarma, Catherine Cifuentes-Rojas, Ayla Ergun, Amanda del Rosario, Yesu Jeon, Forest White, Ruslan Sadreyev, 
and Jeannie T. Lee* 

*Correspondence: lee@molbio.mgh.harvard.edu 
http://dx.d 0 i. 0 rg/l 0.101 6/j.ceii.201 4.1 1 .01 0 

(Ceii 159 , 869-883; November 6, 2014) 

During the finai revision process, we inadvertentiy removed a key reference to prior work iinking ATRX to macroH2A. On page 871 , 
the sentence “ATRX caught our attention, as it was shown to be enriched on the Xi by immunofluorescence (Baumann and De La 
Fuente, 2009) and has an ATPase and helicase domain (Clynes et al., 2013; Ratnakumar and Bernstein, 2013)” should have cited 
the paper by Ratnakumar et al. (2012). The text and reference list have been corrected online. 
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Ratnakumar, K., Duarte, L.F., LeRoy, G., Hasson, D., Smeets, D., Vardabasso, C., Bonisch, C., Zeng, T., Xiang, B., Zhang, D.Y., et al. (2012). ATRX-mediated 
chromatin association of histone variant macroH2A1 regulates a-globin expression. Genes Dev. 26 , 433-438. 
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Microbiota maintains 
host homeostasis 

Medzhitov 

Vol. 118, 229-241 Q3 

"This paper proposed the 
notion that immune ignorance 
to the microbiota, the 
prevaiiing theory at the time, 
may not hoid true in aii cases . " 
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nature of demethyiation.... " 
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naiied this story. " 
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Th17cells 

Cua and Littman 
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As we commemorate 40 years of publishing 
exciting biology we’re also embracing the future. 
We will continue to lead the way with best-in-class 
editorial service, innovations in publishing, and 
new technology to address the evolving needs of 
scientists. Here’s to the next 40 years! 

www.cell.com/40/honne 
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Cell Line: 2004-2014 




2014 marks Cell’s 40th anniversary, and over the year we have looked back at how discoveries of the last four decades have 
molded our understanding of biology. The final decade of the Cell Line features a selection of the exceptional scientific work— 
both landmark papers and essential reviews. Select entries can be read as an “Annotated Classic,” which includes the original 
paper and accompanying reflections of a leading scientist, considering the work from our current vantage point. Our last install- 
ment includes a harbinger of the interplay between microbiota and mammalian hosts in 2004, revolutionary papers in 2006 and 
2007 unlocking cellular reprogramming, the discovery of beige adipocytes in 2012, and the first example of CRISPR-based 
genome editing in a non-human primate in 2014. In addition to landmark publications, there were innovative developments at 
the journal in this decade, with the complete redesign of the print journal and the creation of Leading Edge in late 2005 and the 
restructuring of the online display of the article in 2010. Keeping pace with the changing nature of biological research, over the 
decade Cell added new article types, introduced guidelines for the organization of supplementary material and expanded the 
journal’s web-based content to bring editors’ and authors’ excitement and perspective on individual papers to the readership. 

An interactive version of the timeline, with links to the papers, full author lists, and Annotated Classics, is available at 
http://dx.doi.Org/10.1016/j.cell.2014.11.004. 

The Cell Editorial Team 
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authors on camera 




Pluripotent SCs from 
human fibroblasts 

Yamanaka ^ 



Vol. 131, 861-872 



Lady Gaga parody 
goes viral - 
“I want good data. 
A paper in Cell” 
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Cell Picture Show 
celebrates beauty 
in science 




IncRNAs in 
HOX regulation 

Rinn and Chang 

Vol. 129, 1311-1323 Q3 

"Another surprising resuit: 
the first demonstration 
that a iong noncoding RNA 
can act in trans to reguiate 
a chromatin domain . " 



Tetherin ligates HIV 

Bieniasz 

Vol. 139, 499-511 



ChIP-seq for hi-res 
nucleosome positioning 

Zhao 

Vol. 132, 887-898 



Chromothripsis 

Stratton, Futreal, and Campbell 
Vol. 144, 27-40 B3 

"Peter Campbeii and coiieagues 
made a truiy remarkabie 
observation noting cancer 
genomes in which DNA 
rearrangements occurred in a 
highiy focai manner - with many 
DNA rearrangements basicaiiy 
arising on a singie chromosome. " 
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scientific 

perspectives 



Spatial organization of 
the mouse genome 

Alt and Dekker 
Vol. 148, 908-921 




Circulating factor reverses 
cardiac aging phenotype 

Wagers and Lee 
Vol. 153, 828-839 



Genome editing in a 
nonhuman primate 

Sha 

Vol. 156, 836-843 



RNA granules as hydrogels 

McKnight 
Vol. 149, 753-767 
Vol. 149, 768-779 



Human somatic cell 
nuclear transfer 

Mitalipov 

Vol. 153, 1228-1238 



Evolutionary history from 
whole-genome sequences 
of hunter-gatherers 

Tishkoff 

Vol. 150, 457-469 



I 



Rapid genome 
editing in mice 

Jaenisch 

Vol. 154, 1370-1379 



2012 



2013 



2014 



I 




Hallmarks of Cancer for 
a new decade 

Hanahan and Weinberg 
Vol. 144, 646-674 



Personal 'omics' 

Snyder 

Vol. 148, 1293-1307 
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Beige adipocytes 

Spiegelman 
Vol. 150, 366-376 




Base-resolution 
sequencing of 5hmC 

Ren 

Vol. 149, 1368-1380 
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Mouse cortical 
connectomics 

Dong 

Vol. 156, 1096-1111 




cGAMP generation and 
immune function 

Patel 

Vol. 153, 1094-1107 
Vol. 154, 748-762 
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Join or Renew Your 
Membership Today! 

New for 2013, become a Regular Member of ASHG and get a discount on 
Regular Membership with the European Society of Human Genetics! 

$25 - $70 — Trainee $190 — Regular Members 

Members multi-year discounts now ovoiloble 



Membership benefits include: 

• Reduced fees for the 2014 ASHG Annual meeting • Free access to invited session recordings from past ASHG 

• Subscription to The American Journal of Human Genetics ^nnuoi Meetings through the membership web portoi 



THE 

AMERICAN 
SOCIETY 
OF HUMAN 
GENETICS 





• Influence on Public Policy Issues 

• Access to Educational Materials 



For more information visit: 



• Discount to Journal Subscriptions (Nature, Nature Reviews 
Genetics, and more!) 



If you have questions regarding membership 
status, please contact us: 



www.ashg.org/join 



The American Society of Human Genetics 
9650 Rockville Pike • Bethesda, MD 20814-3998 

1 .866.HUMGENE • Membership: scelia@ashg.org 



^ Columbia University 

IN THE CITY OF NEW YORK 

The 2015 Louisa Gross Horwitz Prize for Bioiogy or Biochemistry 

The Louisa Gross Horwitz Prize was established under the will of the late S. Gross Horwitz through a bequest to Columbia University and is 
named to honor the donor’s mother. Louisa Gross Horwitz was the daughter of Dr. Samuel David Gross (1805-1889), a prominent surgeon of 
Philadelphia and author of the outstanding Systems of Surgery who served as President of the American Medical Association. 

Each year since its inception in 1967, the Louisa Gross Horwitz Prize has been awarded by Columbia University for outstanding basic 
research in the fields of biology or biochemistry. The purpose of this award is to honor a scientific investigator or group of investigators whose 
contributions to knowledge in either of these fields are deemed worthy of special recognition. 

The Prize consists of an honorarium and a citation which are awarded at a special presentation event. Unless otherwise recommended by the 
Prize Committee, the Prize is awarded annually. Dr. James P. Allison, University of Texas MD Anderson Cancer Center was the 2014 awardee. 

QUALIFICATIONS FOR THE AWARD 

The Prize Committee recognizes no geographical limitations. The Prize may be awarded to an individual or a group. When the Prize is awarded 
to a group, the honorarium will be divided among the recipients, but each member will receive a citation. Preference will be given to work 
done in the recent past. 

Nominations must be submitted electronically at: http://www.cumc.columbia.edu/research/horwitz-prize 
All communications and materials must be written in the English language. 

Re-nomination(s) are by invitation only. 

Self-nominations are not permitted. 

Nominations should include: 

1) A summary, no more than 500 words long, of the research on which this nomination is based. 

2) A summary, no more than 500 words long, of the significance of this research in the fields of biology or biochemistry. 

3) A brief biographical sketch of the nominee, including positions held and awards received by the nominee. 

4) A listing of up to ten of the nominee’s most significant publications relating to the research noted under item 1. 

5) A copy of the nominee’s curriculum vitae. 

Deadline date: January 30, 2015 
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MAYO 

CLINIC 






BIOMEDICAL DISCOVERY SCIENTIST 



The Department of Biochemistry and Molecular Biology at Mayo Clinic In 
Rochester, MN Is seeking an outstanding scientist to align with Its newly 
designated Center for Biomedical Discovery. The Individual will develop and/or 
maintain a nationally and Internationally recognized extramurally funded research 
program. Candidates should hold a Ph.D., M.D., or M.D./Ph.D. and have 
expertise In fibrosis, senescence, or cancer. 

Mayo Clinic has been recognized as the best hospital In the nation for 
201 4-201 5 by U.S. News and World Report and Is consistently recognized by 
FORTUNE magazine as one of the “1 00 Best Companies to Work For.” We are 
an Integrated, multidisciplinary academic medical center with comprehensive 
programs In medical education and research that span across three group 
practice sites. We support a vibrant research enterprise, with programs In 
clinical, basic and population sciences. In 2013, the Institution received nearly 
$370 million In extramural research awards, supplemented with over $275 
million of Institutional support. Mayo Clinic has a large number of outstanding 
laboratory and shared resource facilities. Including a Medical Genomics 
Facility, a Mouse Modeling Facility, Molecular and Microscopic Core Facilities, 
and unique Human Disease Tissue Banks. We offer a highly competitive 
compensation package with sustained Intramural funding, capital equipment 
funding, technical and computational resources, and exceptional benefits. 

To apply online, please visit: http://mayocl.in/1zfshFU 

For further Information please contact: Edward B. Leof, Ph.D., 

Search Committee Chair at Schilbe.Jennifer@mayo.edu 



Heal the sick, advance the science, share the knowledge. 



©2014 Mayo Foundation for Medical Education and Research. Post offer/pre-employment drug screening is required. 
Mayo Clinic is an equal opportunity educator and employer (including veterans and persons with disabilities). 




STANFORD UNIVERSITY 

DEPARTMENT OF CHEMICAL AND SYSTEMS BIOLOGY 



The Department of Chemical and Systems Biology at Stanford University School 
of Medicine invites applications for a tenure-track position at the ASSISTANT 
PROFESSOR level. We are particularly interested in candidates with a strong 
interdisciplinary record in the broad areas of chemical biology, systems biology, 
and/or cellular and molecular biology in normal and disease states. Stanford offers 
an outstanding environment for creative interdisciplinary biomedical research. The 
main criterion for appointment in the University Tenure Line is a major commitment 
to research and teaching. For more information on our department, please visit: 
http://chemsysbio.stanford.edu/. 

Candidates should have a Ph.D. and/or M.D. degree and postdoctoral 
research experience. Applications should include a cover letter addressed to 
Tobias Meyer, Ph.D., Search Committee Chair; a curriculum vitae, 
publication list, description of future research plans, and at least 3 letters of 
reference. All materials should be submitted online to Academic Jobs Online at 
https ://academici obsonline . org/ aj o/i obs/4602 . 

To ensure full consideration, please submit your applications by December 1, 2014. 
Late applications may be considered. 

Questions should be addressed to Kathy Johnson, 

Faculty Affairs Administrator at csbsearch@stanford.edu . 

Stanford University in an equal opportunity employer and is committed to increasing 
the diversity of its faculty. It welcomes nominations of and applications from women, 
members of minority groups, protected veterans and individuals with disabilities, 
as well as from others who would bring additional dimensions to the university’s 
research, teaching and clinical missions 




Be their next 
big discovery. 

careers.cell.com 

Where tog^ life sciences talent is searching 
for their next opportunity 




Career Network 



Discover the best jobs in life science 

To showcase your job openings on Cell Career Network, 
contact: advertising@cell.com 
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Postdoctoral 
Program 

Novartis Institutes for 
BioMedical Research ^ 



Do you have a passion for innovative 
fundamental research in drug discovery? 

We are seeking creative postdoctoral scholars to 
conduct exciting research in biology, chemistry, 
and computational sciences at the frontier of 
drug discovery. 

Postdocs at NIBR who will subsequently pursue 
faculty positions in academia are eligible for 
start-up funds through our new and highly 
competitive Young Investigator Awards. 

Visit http://postdoc.nlbr.com to view our 
mentors’ research profiles and to apply.J 
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Assistant Professor 

Departments of Cancer Biology, Dana-Farber Cancer Institute 
and Cell Biology, Harvard Medical School 

The Departments of Cancer Biology at the Dono-Forber Cancer Institute and Cell Biology at Harvard Medical School invite applicants for a 
tenure-track faculty position at the rank of Assistant Professor or Associate Professor. We ore seeking individuals with a demonstrated potential 
for imaginative research and who propose to work on exciting problems in any area of Molecular Metabolism. We ore especially interested 
in candidates who utilize metabolomics, proteomics or both to uncover novel metabolic pathways in health and diseased states, the successful 
candidate will be expected to direct innovative and independent research and participate in the teaching of graduate and/or medical students. 
Our highly interactive environment provides the opportunity to engage and collaborate with other dedicated researchers both within the 
Division of Metabolism and Chronic Disease of the Cancer Biology Department (chaired by Bruce Spiegelman) and throughout the diverse 
Harvard research community. Significant scholarly and scientific resources will be made available for this appointment. Applicants will 
be housed in new space at the Dana-Farber Cancer Institute. For further information about our Department, please see our web page: 
http://www.dana-farber.org/Research/Departments-and-Centers/Department-of-Cancer-Biology.aspx 

Applicants should submit electronic copies of their curriculum vltoe, o description of research accomplishments and future 
research Interests (three pages maximum), and ask at least three references to provide letters or recommendation. 
These materials should be submitted using the following link: https://academlcposltlons.harvard.edu/postlngs/5815. 
Please contact Kim Wilkinson (Klm_Wllklnson@dfcl.harvard.edu) with any questions regarding submission of documents. 

Applications must be received by: February 28, 2015. 

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, 
sex, notional origin, disability status, protected veteran status, or any other characteristic protected by law. 
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USCUniversi^ of 
Southern California 



Faculty Position 

Division of Hematology, Department of Medicine 
University of Southern California Keck School of Medicine 

Supported by a $58 million endowment, the Jane Anne Nohl Division of 
Hematology and Center for the Study of Blood Diseases and the USC 
Norris Comprehensive Cancer Center at the Keck School of Medicine 
seek faculty members interested in basic, translational and clinical research 
in novel treatment approaches for blood cancers. Candidates working 
in the areas of chimeric antigen receptors, allogeneic stem cell graft engi- 
neering, adoptive cell therapy, TCR gene therapy, immune mobilizing 
monoclonal T cells against cancer (ImmTac), peptide vaccination, 
antibody drug conjugates, bispecific antibodies, molecularly targeted dmgs 
or other novel treatment approaches against blood cancers are strongly 
encouraged to apply. Preference will be given to applicants who have MD 
or MD/PhD degrees, BC/BE in Hematology and/or Medical Oncology and 
eligibility for a California medical license. Successful candidates will 
receive competitive compensation and appointment at the appropriate 
academic level (Assistant/ Associate/Full Professor) commensurate with 
experience and accomplishments. 

Applicants should send letter of interest, CV and 
names of three references to 
Preet Chaudhary, MD, PhD 
Chief of Nohl Division of Hematology 
Keck School of Medicine of USC 
1441 Eastlake Avenue, NOR 3470, MC 9172 
Los Angeles, CA 90033 
Email: Preet.Chaudhary@med.usc.edu. 



Assistant/Associate Professor of 
Microbiology and Immunobiology 

Metabolism/Microbiome 

The Joslin Diabetes Center, Boston, Massachusetts, invites applications 
from laboratory-based investigators with a focus on understanding reciprocal 
interactions between the microbiome and metabolism. The successful applicant 
will be recommended for full-time appointment as a tenure-track Assistant or 
Associate Professor, commensurate with accomplishments. This individual will 
develop an independent research program aimed at mechanistic investigations 
that will clarify the influence of microbiota on metabolic disorders and the 
development of diabetes and its complications, collaborating closely with faculty 
at the Joslin and Harvard Medical School interested in obesity, autoimmunity and 
inflammation. The candidate will be based at the Joslin Diabetes Center and will 
be a member of the Department of Microbiology and Immunobiology, Harvard 
Medical School. 

This position includes a highly competitive compensation and laboratory start-up 
package. Applicants must have an MD and/or PhD and a proven track record of 
outstanding, innovative research. 

Interested candidates should submit curriculum vitae, reprints of up to three 
of their most significant publications, a brief statement of present and future 
research plans (3-5 pages), and three letters of reference to Professor Amy 
Wagers, Chair of the Search Committee. 

Materials should be submitted via email to: 

microbiome@joslin.harvard.edu. 

The submission deadline for receipt of all application materials is December 15, 
2014. We strongly encourage applications from women and minority candidates. 
Joslin is the world’s largest diabetes research and clinical care organization, and 
an independent, nonprofit institution affiliated with Harvard Medical School. Joslin 
is dedicated to ensuring that people with diabetes live long, healthy lives and 
offers real hope and progress toward diabetes prevention and a cure. 

Our mission is to prevent, treat and cure diabetes. 

Our vision is a worid free of diabetes and its compiications. 

Joslin Diabetes Center is an equal opportunity employer and all qualified applicants will receive 
consideration for employment with regard to race, color, religion, sex, national origin, disability 
status, protected veteran status, or any other characteristic protected by law. 
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UNCOVER 
NEW INSIGHTS 
TO HELP YOU 
IDENTIFY POTENTIAL 
BIOLOGICAL TARGETS 



Target Insights is a support tool that helps users find 
important facts that other search tools may miss, because 
it is based upon an automated analysis of the full text 
of scientific articles. Target Insights gets you to the right 
target quicker and with more confidence than other tools. 

Developed by Elsevier to meet increasing demands on speed and 
efficiency it enables target identification and validation teams to discover 
new insights and update their understanding of the disease model by 
revealing crucial information which is normally obscured and hard-to-find, 
in three steps... Identify, Prioritize, Validate. 
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targetinsights 

IDENTIFY PRIORITIZE VALIDATE 





ELSEVIER 



For the latest information visit 

targetinsights.com/info 

Copyright © 2012 Elsevier Properties SA. All rights reserved. 

Target Insights™ is owned by Elsevier Properties SA and used under license. 






Discover high-quality, 
open-access research 



Cell Reports features: 

■ High-quality, cutting-edge research 

■ A focus on short, single-point papers 
sailed Reports 

■ Broad scope covering all of biology 

■ Flexible open-access policy 

■ A highly engaged editorial board 

■ A distinguished advisory board 

■ New papers online weekly 



Inaugural Issue 




-* w]^^jCeirprejCs-com 



Volume 

Number^l* 

January 26, 2012 



cellreports.cell.com 








Cell Press 
content 
Is widely 
accessible 



At Cell Press we place a high priority on ensuring that all of 
our journal content is widely accessible and on working with 
the community to develop the best ways to achieve that goal. 



Here are just some of those initiatives... 



Open archives 

We provide free access to Cell Press 
research journals 12 months following 
publication 


Access for developing 
nations 

We provide free & low-cost access 
through programs like Research4Life 


Open access journal 

We launched Cell Reports - a new 
Open Access journal spanning the life 
sciences 


Funding body agreements 

We work cooperatively and 
successfully with major funding 
bodies 


Public access 

Full-text online via ScienceDirect is 
also available to the public via walk 
in user access from any participating 
library 


Submission to PubMed 
Central 

Cell Press deposits accepted 
manuscripts on our authors' behalf for 
a variety of funding bodies, including 
NIH and HHMI, to PubMed Central 
(PMC) 



WWW .cell, com/ce 1 1 p ress/access 














We recognize 
what matters 
in science. 



Our editors select reviewers appropriate for 
your paper and will help you interpret their 
comments - striking a balance between the 
ideal and the reasonable. 

When you choose Cell Press you get the 
attention you deserve. 

For more information visit: www.cell.com/values 
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Cell Press 
Snapshots at 
your fingertips 




Access our Snapshot collection as well as new Snapshots 
every two weeks for free! Snapshots are handy, one-page 
reference guides that include up-to-date tables of 
nomenclature and glossaries, signaiing pathways, 
schematic diagrams of ceiiuiar processes, concepts in 
deveiopment and disease, and much more. 



■ All free 

■ Beautiful images 



■ Great for studying 

■ Searchable 
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cell.com/mobile 
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Discover the 
latest Trends 
in your field 



Cell Press Trends journals feature: 

■ Cutting-edge Review and Opinion articles 




Trends in 

I Biochemical 
I Sciences I 
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■ Authoritative, succinct and accessible 
content 

■ Discussion, analysis and debate 
For more information visit 

cell.com/trends 
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I Parasitology I 
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! • Access to the 14 Cell Press primary research journals and 
14 Trends reviews titles, all on the same platform 
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! • Improved, more robust article and author search 



• Video, animations and sound files 

• Easy to navigate home page, articles pages and archive 
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App Features: 

• Available to subscribers at no additional charge 

• Free 30-day trial to all 30 Cell Press journals 

• Interactive reading experience 

• Browse abstracts and full-text articles 

• Download now, read later 

• Make article notes 

• Share with your friends and colleagues 

• Now on the iPhone, iPad, Android tablet and Android phone 



Download the app at 
cell.com/mobile 
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Simply stunning 



EVOS cell imaging systems 



Minimize the complexities of microscopy without compromising performance 
Smarter systems I Easier cell imaging I Faster results 

Compare systems at lifetechnologies.com/simplystunning f technologies 
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